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The  lifting  scheme  is  a  flexible  tool  for  constructing  compactly  supported  second 
generation  wavelets  which  are  not  necessarily  translates  and  dilates  of  one  mother 
wavelet.  The  lifting  scheme  uses  a  simple  basis  function  and  builds  a  better  performing 
one  with  desirable  properties.  Flexibility  afforded  by  the  lifting  scheme  allows  basis 
functions  to  change  their  shapes  near  the  boundaries  without  degrading  regularities.  The 
lifting  scheme  also  provides  fast  processing  by  making  optimal  use  of  similarities  between 
high  and  low  pass  filters.  The  lifting  algorithm  was  originally  introduced  to  construct 
biorthogonal  wavelets  associated  with  interpolating  scaling  functions,  and  was 
generalized  later  using  a  finite  number  of  lifting  steps  by  means  of  the  Euclidean 

ix 


algorithm. 

The  lifting  scheme  utilizes  a  classical  2-channel  filter  bank  as  a  framework  for 
multiresolution  analysis.  However  this  traditional  framework  is  not  translation  invariant. 
Representations  with  a  translation-invariant  characteristic  are  highly  desirable  for  feature 
analysis.  In  this  dissertation  we  address  the  following  question:  Can  the  lifting  scheme  be 
used  as  a  framework  for  overcomplete  wavelet  representations  with  multiscale  feature 
analysis  in  mind?  Our  work  addresses  this  question  by  investigating  each  stage  of  the 
multiscale  analysis:  Split,  Dual  lifting  and  Primal  lifting.  We  introduce  a  smoothing  Lazy 
wavelet  in  the  split  stage  which  does  not  subsample,  but  smooths  an  input  image  so  that 
the  low-pass  channel  contains  some  redundant  information.  Since  the  new  split  operation 
does  not  use  downsampling  to  make  distinct  subsets,  there  is  no  aliasing.  We  show  that  the 
primal  lifting  required  to  compensate  for  the  aliasing  by  preserving  energy  between  two 
contiguous  approximations  needs  no  longer  exist.  Only  the  predict  stage,  based  upon 
specific  properties,  indeed  leads  to  a  useful  multiresolution  analysis  which  is  applicable  to 
multiresolution  feature  analysis.  We  also  show  in  our  work  that  the  proposed  scheme 
achieves  better  performance  without  introducing  boundary  artifacts  that  exist  in  the 
traditional  methods.  This  makes  the  overcomplete  lifted  wavelet  transform  very  viable  in 
interactive  processing  paradigms  such  as  a  Web-based  client-server  model  for  multiscale 
feature  analysis  where  fast  processing  is  highly  desirable. 
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CHAPTER  1 
INTRODUCTION 

Wavelet  functions  vy  .  m  are  traditionally  defined  as  the  translates  and  dilates  of 
one  particular  function  in  L2(R),  mother  wavelet  \\f ,  such  that  m(x)  =  \\f(2?x-m). 
Although  these  wavelets  have  a  nice  property  of  forming  a  Riesz  basis  for  L2(R) ,  discrete 
implementations  typically  require  compromised  solutions  such  as  zero  padding, 
symmetric  extension,  or  periodization  to  overcome  artifacts  around  boundaries.  Further 
more,  the  translation  invariance  desirable  in  multiscale  algorithms  for  multiscale  feature 
analysis  is  not  preserved  in  the  standard  wavelet  transform  as  described  by  Mallat  [19].  In 
addition,  the  standard  algorithm  in  the  filter  bank  structure  is  not  necessarily  the  best  way 
to  implement  a  wavelet  transform.  There  have  been  several  schemes  to  improve  the 
standard  fast  wavelet  transform  such  as  split-radix  FFT  algorithms  and  short-length  FIR 
filtering.  For  long  length  filters,  an  FFT  based  scheme  known  as  Vetterli-algorithm  was 
suggested,  and  for  short  length  filters,  a  fast  running  FIR  algorithm  was  formulated  by 
Mou  and  Duhamel  [21]. 

Recently,  the  concept  of  lifting  was  introduced  by  Wim  Sweldens  [29]  [30]  to 
alleviate  the  problems  associated  with  a  traditional  digital  filter  bank  scheme.  In  this 
scheme,  basis  functions  associated  with  each  wavelet  coefficient  are  no  longer  necessarily 
dilates  and  translates  of  a  mother  wavelet.  Flexibility  afforded  by  the  lifting  scheme 
allows  the  basis  functions  near  boundaries  to  change  their  general  shape  for  better 
accommodation.  Also  the  lifting  scheme  allows  a  faster  processing  by  making  optimal  use 
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of  similarities  between  high  and  low  pass  filters  to  speed  up  the  calculation  of  coefficients. 
Ingrid  Daubecheies  and  Wim  Sweldens  [11]  also  showed  that  the  lifting  is  general  in  that 
every  wavelets  and  wavelet  transform  with  finite  filters  can  be  obtained  with  a  finite 
number  of  lifting  steps. 

In  this  dissertation,  a  comprehensive  study  on  the  construction  of  wavelets  and 
their  associated  transform  using  the  lifting  scheme  shall  be  thoroughly  carried  out  starting 
from  a  biorthogonal  wavelet  construction  and  constructions  of  orthogonal  and 
semiorthogonal  wavelets  with  a  finite  number  of  lifting  steps.  We  show  during  the  study 
how  the  boundary  artifacts  existing  in  a  standard  wavelet  transform  via  a  filter  bank 
structure  are  taken  care  of  with  interpolating  lifting  schemes.  A  new  algorithm, 
Overcomplete  Lifting  Scheme  (OLS),  for  overcomplete  wavelet  representations  is 
formulated  as  a  mathematical  problem.  Error  to  signal  ratio  for  the  reconstructions  is 
computed  to  determine  robustness  of  the  new  framework.  Finally  computational 
complexity  analysis  and  comparisons  are  carried  out  to  identify  the  speedup  the 
overcomplete  lifting  scheme  can  achieve.  From  the  comparison  we  show  that  the  OLS 
with  a  block  algorithm  is  most  efficient. 


CHAPTER  2 
MOTIVATIONS 

In  this  chapter  we  describe  practical  backgrounds  that  have  motivated  our 
research.  We  first  discuss  boundary  effects  resulting  from  various  extensions  to  handle  a 
finite  length  input.  We  then  show  that  overcomplete  wavelet  representations  have  an 
increased  amount  of  data  to  be  represented  and  processed.  Finally,  we  describe  a  demand 
of  high  speed  transform  for  interactive  image  processing. 

Wavelet  theory  provides  a  systematic  framework  for  multiresolution  analysis. 
Wavelets  are  versatile  tools  for  representing  functions  or  data  sets  in  a  multiscale  space. 
The  capability  of  time-frequency  localization  of  wavelets  makes  representations  of  signals 
efficient  [10].  The  justification  of  using  dyadic  wavelets  and  wavelet  transform  has  been 
supported  by  several  preliminary  results  [1]  [2]  [3]  [16]  [17]. 

Several  algorithms  were  reviewed  by  Rioul  and  Duhamel  [23]  for  computing 
various  types  of  wavelet  transforms.  Mallat  [19]  introduced  a  fast  discrete  dyadic  wavelet 
transform  via  a  filter  bank  structure:  the  standard  wavelet  transform,  where  four  filters  are 
needed  to  compute  forward  and  inverse  processing  steps.  While  the  filters  do  not  change 
during  the  computation,  the  computation  of  the  filter  coefficients  is  not  a  trivial  task.  In 
addition,  the  standard  wavelet  transform  suffers  from  boundary  artifacts  if  the  length  of 
input  is  not  power  of  two,  a  common  problem  associated  with  digital  filters  operating  on  a 
finite  sets  of  data.  There  have  been  numerous  attempts  at  minimizing  this  boundary  effect. 
The   most  common   approaches   have  been   zero-padding,   symmetric  extension, 
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periodization  and  smooth  extension.  These  walkaround  approaches  are  effective  in  theory 
but  are  not  entirely  satisfactory  from  a  practical  viewpoint.  Figure  2-1  shows  an  idea  about 
the  signal  end  effects.  In  this  figure,  the  wavelet  coefficients  not  only  capture  the  location 
in  time  of  the  discontinuities,  but  also  display  the  evolution  of  unwanted  boundary 
artifacts  as  the  scale  grows.  Figure  2-2  illustrates  the  boundary  effects  generated  by  some 
signal  extensions:  zero-padding,  symmetrization  and  smooth  padding.  Zero-padding 
assumes  that  the  input  signal  is  zero  outside  the  support.  This  extension  obviously 
introduces  artificial  discontinuities  at  the  border.  Symmetrization,  which  works  if  the  basis 
functions  are  symmetric,  is  to  use  reflection  across  edges.  This  extension  mode  preserves 
continuity,  but  also  introduces  discontinuities  in  the  first  derivative.  Coefficients  created 
by  these  two  extension  modes  shown  in  Figure  2-2(c)  and  Figure  2-2(d)  respectively 
clearly  demonstrate  that  wavelets  designed  for  detecting  singularities  within  finite-length 
signals  likely  consider  the  boundaries  as  other  singularities  of  the  input  signals.  The  third 
extension  method  shown  in  Figure  2-2(e),  smooth  padding,  extends  the  finite  support  of 
the  input  signal  by  a  first  order  derivative  extrapolation.  Figure  2-3  illustrates  the 
boundary  effects  introduced  by  the  extensions  described  applied  to  an  image.  While  the 
end  effects  based  upon  smooth  padding  extension  did  not  appear  in  1-D  signal  case  (see 
Figure  2-2(e)),  some  artifacts  clearly  appear  in  2-D  image  case  (see  Figure  2-3(d)).  Thus 
we  need  basis  functions  that  overcome  the  undesirable  effects  near  boundaries  without  any 
extension. 

The  standard  dyadic  wavelet  transform  given  by 

IW^n)  =  (f(x),<\>/x-ln))\nJ)eZ2  (2J) 
provides  discrete,  uncorrected  and  complete  representation  of  the  function  f{x)  .  One  of 
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Figure  2-1.  Example  of  wavelet  coefficients  showing  boundary  effects 
propagating  through  the  continuous  scale  space. 
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Figure  2-2.  Illustrations  of  boundary  effects  among  different  signal  extension  modes 
at  the  boundaries,  (a)  1-D  original  signal  with  discontinuities  in  the  middle,  (b) 
Cropped  signal  showing  the  discontinuities  in  the  original  signal,  (c)  Wavelet 
coefficients  from  zero-padded  extension.  The  wavelet  coefficients  clearly  show  the 
boundary  effects,  (d)  Detail  coefficients  from  symmetric  extension.  The  detail 
coefficients  show  both  the  boundary  effects  and  the  discontinuities,  (e)  Wavelet 
coefficients  from  smooth  padding.  The  coefficients  show  the  discontinuities,  but  not 
the  boundary  effects. 
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Figure  2-3.  Illustrations  of  boundary  effects  on  an  image  based  on  zero-padding, 
symmetrization  and  smooth  padding  extensions.  Level  5  decomposition  is  performed 
using  Daubechies  tap  5  wavelet,  and  then  reconstruct  the  approximation  of  level  5.  (a) 
Original  image,  (b)  (c)  (d)  Reconstructed  images  using  zero-padding,  symmetrization 
and  smooth  padding  extensions,  respectively. 
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the  main  advantages  of  wavelet  transform  over  Fourier  transform  is  the  time-frequency 
localization.  The  wavelet  transform  based  upon  some  basis  function  \|/  has  a  zoom-in  and 
zoom-out  capability  in  time-frequency  space.  In  other  words,  the  wavelet  transform  has  a 
flexible  time-frequency  window  that  automatically  narrows  at  high  center  frequency  and 
widens  at  low  center  frequency. 

However,  such  representations  may  not  be  well  suited  to  pattern  recognition  or 
feature  analysis  because  of  its  non-invariance  with  translations  [20].  In  order  to  preserve 
translation  invariance,  each  function  W^ix)  is  oversampled  to  give  a  redundant 
representation.  This  representation  however  needs  considerably  increased  amount  of  data 
for  processing.  An  adaptive  sampling  using  local  wavelet  maxima  has  been  proposed  by 
Mallat  and  Zhong  [20].  Since  the  multiscale  edge  representations  are  sparse  compared  to 
original  data,  the  processing  cost  based  on  these  representations  may  be  reduced. 

Benn  et  al.  [1]  used  the  local  maxima  representations  to  extract  volumes  of  interest 
for  automated  knowledge-based  feature  recognition.  We  developed  a  novel  method  to 
represent  multiscale  objects  with  maxima  surface:  Isomaxima  surface. 

We  first  apply  a  1-D  discrete  dyadic  wavelet  transform  in  x,  y  and  z  directions  to  a 
CT  volume  data  set  to  obtain  analyzing  coefficients  (gradient  vectors).  Then  we  compute 
modulus  and  two  directions  from  the  coefficients  that  detect  boundaries  of  objects  by 
locating  a  local  maximum  in  the  direction  of  the  gradient.  The  maxima  are  then 
triangulated  based  on  certain  properties,  such  as  a  (local  or  global)  regularity  of  the 
surface,  directional  constraints  and  a  magnitude  tolerance  between  nearby  maxima. 
Arbitrary  tiangular  meshes  are  converted  to  a  multiresolution  form  that  has  subdivision 
connectivity.  The  multiresolution  mesh  representations  finalized  can  be  used  in 


compression,  incremental  display  and  transmission,  level-of-detail  control  and 
multiresolution  editing. 

3-D  wavelet  modulus  and  angle  are  given  respectively  by 


A/ =  {arctan(a),  arctan(P)}  =  jarctan 


rtff\  (W*f 


,  arc  tan 


The  3-D  maxima  is  then  defined  by 


[  Mjf,     if  Mjf  is  maximum  along  the  gradient  direction  A jf , 
M^     [0,  Otherwise. 

Then  a  maxima  vertex  v„  at  level can  be  found  if  it  satisfies  the  following  condition: 
|M/(vw)  -  M/(vm)|  <  7*  ,  where  vm  is  a  neighboring  maxima  vertex,  T  is  a  threshold 
and  a  is  a  regularity  constant.  The  isomaxima  surface  would  then  be  used  in 
multiresolution  surface  analysis  proposed  by  Eck  et  al.  [14]  and  Lounsbery  et  al.  [18]. 
Figure  2-4(a)  shows  an  input  3-D  volume  from  axial  CT  slices.  Figure  2-4(b)  and  2-4(c) 
show  simulated  output  of  a  3-D  wavelet  representation  and  a  corresponding  3-D 
multiscale  edges,  respectively.  A  3-D  segmentation  of  maxillary  sinuses  is  shown  in 
Figure  2-5. 

However,  in  order  to  construct  3-D  volumes  of  interest  from  one  study  of  a  patient, 
we  need  approximately  140MB  -  1500MB  disk  space.  The  numbers  come  from  the 
following  specifications:  image  size  is  512  by  512,  pixel  depth  2  bytes,  number  of  slices  in 
one  study  30-  120,  3-way  (x,  y  and  z  directional)  transform,  and  level  of  analysis 
practically  goes  from  3  to  8.  The  wavelet  transform  on  the  3-D  data  set  is  thus  very 
computationally  burdensome  and  required  huge  amount  of  auxiliary  memory.  A  wavelet 


Figure  2-4.  (a)  Input  3-D  volume  from  axial  CT  slices,  (b)  3-D  wavelet  coefficients 
simulated  with  2-D  wavelet  coefficients,  (c)  3-D  wavelet  maxima  edges  obtained  from 
2-D  multiscale  edges. 


Figure  2-5.  3-D  volume  of  sinuses  extracted  from  the  local  maxima  representation 
shown  in  Figure  2-4(c). 
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transform  with  an  in-place  implementation  is  demanding. 

The  increased  amount  of  data  for  overcomplete  representations  where  an  FFT  has 
been  the  basic  implementation  tool  are  also  computationally  burdensome  and  inefficient. 
As  pointed  out  by  Rioul  and  Duhamel  [23],  the  FFT  method  performs  best  when  the  filter 
length  is  long  (over  16  taps).  Since  the  filters  we  are  using  in  the  transform  are  compact, 
we  may  have  a  faster  transform  by  avoiding  the  FFT.  Practically,  a  faster  wavelet 
transform  is  needed  for  applications  where  real  time  interaction  is  required.  We  designed  a 
Web  page  (http://www.iprg.cise.ufl.edu/rsna97/)  originally  created  for  RSNA 
(Radiological  Society  of  North  America)  1997  exhibition  in  Chicago,  Illinois.  The  Web 
page  was  designed  such  that  radiologists  and  medical  imaging  professionals  could  learn 
about  and  use  the  multiscale  techniques  for  mammographic  feature  enhancement 
developed  by  Laine  et  al.  [17]  and  Iztok  et  al.  [16].  A  faster  algorithm  was  demanded  to 
accomplish  high-speed  interactive  multiscale  processing  for  the  wavelet-based 
enhancement.  Otherwise,  the  embedded  nature  of  real-time  interaction  of  the  Web-based 
documenting  would  lose  its  virtue. 


CHAPTER  3 

MULTIRES OLUTION  ANALYSIS  AND  WAVELETS 

In  this  chapter,  we  briefly  describe  multiresolution  analysis,  wavelets  and  wavelet 
transforms  that  are  needed  to  understand  and  situate  the  remainder  of  the  dissertation.  We 
first  review  general  properties  of  wavelet  and  various  wavelet  transforms  which  have 
became  known  as  useful  tools  for  various  signal  or  image  processing  applications.  The 
definitions  of  multiresolution  analysis  and  scaling  function  are  given  in  the  next  section, 
where  we  give  the  basic  definition  of  wavelet  as  well.  We  also  study  the  connection 
between  wavelets  and  polynomials,  and  show  how  this  relates  to  the  approximation 
properties  of  wavelet  expansions. 

3.1  Multiresolution  Analysis 

A  multiresolution  analysis  of  L2(R)  is  defined  as  a  sequence  of  closed  subspaces 
Vj  c  L2{R)  ,je  Z,  with  the  following  properties: 

1)  Vj<zVJ+i, 

2)  for  all  j  e  Z,  f(x)  e  V2,  <^f{2x)  e  V^,, 

3)  for  all  /  e  Z,  f(x)  e  V0  <=>/(*  +  /)  e  V0, 

00 

4)  lim  V2i  =    KJ  Vv  is  dense  in  L2(R) , 

J  J  =.  — oo 

oo 

5)  lim  V2j  =    Pl  V2I  =  {0}, 

J  -*  -°°  ;  =  -oo 
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6)  a  scaling  function    6  V0 ,  with  a  non-vanishing  integral,  exists  so  that  the  wavelet 
family  {n/y  k\k  e  Z}  forms  a  Riesz  basis  of  V0 . 

The  first  property  is  a  causality  property.  The  information  of  a  signal  present  at  a 
scale  ;  +  1  contains  all  necessary  information  to  generate  the  same  signal  at  a  coarser  level 
; .  The  properties  (2)  and  (3)  are  scale  and  shift  invariance,  respectively.  In  other  words, 
the  property  (2)  tells  us  that  the  operation  of  multiresolution  analysis  is  unique  at  all 
resolutions,  so  that  each  space  of  approximated  functions  in  L2(R)  should  be  able  to  be 
derived  from  one  another.  The  property  (3)  is  an  isomorphism.  When  f(x)  is  translated  by 
a  length  proportional  to  7) ,  the  approximation  of  f(x)  is  translated  by  the  same  amount 
and  is  characterized  by  the  same  samples  which  have  been  translated.  The  properties  (4) 
and  (5)  are  about  the  convergence.  When  computing  an  approximation  of  f(x)  at 
resolution  D ,  some  information  about  f(x)  is  lost.  Property  (4)  tells  that  as  the  resolution 
increases  to  +°° ,  the  limiting  signal  should  converge  to  the  original  signal.  Property  (5)  is 
the  converse  of  property  (4),  that  is,  as  the  resolution  decreases,  the  approximation 
contains  less  and  less  information  and  converges  to  zero. 

Property  (1)  says  that  if  (p(x)  is  in  V0,  then  it  is  also  in  V, .  The  scaling  function 
thus  satisfies 

(p(x)  =  2£/^(p(2;c-fc)  .  (3.1) 

This  is  the  dilation  equation.  It  is  a  two-scale  difference  equation,  involving  x  and  2x .  It 
is  also  called  a  refinement  equation,  because  it  displays  cp(x)  in  the  refined  space  Vx .  We 
refer  to  h  as  the  filter  associated  with  the  scaling  function  cp .  Under  general  conditions, 
the  scaling  function  is  uniquely  defined  by  the  refinement  equation  and  the  normalization, 
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such  that 

rq(x)dx  =  1  . 
—JO 

This  implies  that  ^<$(x-k)  =  1 ,  so  that  as  given  in  property  (6),  the  integer  translates 

k 

of  the  scaling  function  {(p(x  -  l)\l  e  Z}  form  a  Riesz  basis  for  the  closure  of  their  span, 
and  also  partition  unity.  The  refinement  equation  (3.1)  is  equivalent  to 

where  /i(co)  is  defined  as 

/i((0)  =  JVHttl  • 

Since  cp(0)  =  1 ,  iteration  of  the  refinement  relation  given  in  (3.1)  yields  the  product 
formula  for  the  scaling  function  as 

oo 

9(co)  =  Y[h{2-}<a),  (3.2) 

which  can  be  used  to  construct  (p(x)  from  the  sequence  {hk}  . 

Multiresolution  analysis  can  also  be  achieved  by  building  a  multiresolution 
representation  based  on  the  difference  of  information  available  between  two  contiguous 
resolutions  2J  and  D 1  +  1 .  The  difference  of  information  is  called  the  detailed  signal  at 
resolution  1)  which  is  given  by  the  orthogonal  projection  of  the  original  signal  on  the 
orthogonal  complement  space  of  V-  in  V,  +  j .  Let  Oj  denote  this  orthogonal  complement 
space.  Then  it  satisfies 

Vj+l  =  VjQOj. 

Since  the  space  Oj  contains  the  detail  information  needed  to  generate  an  approximation  at 
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resolution  +  1 ,  it  also  satisfies 

©  O:    =    L2(R)  ■ 

J 

A  wavelet  \j/(jc)  is  an  orthonormal  basis  of  Oj  needed  to  compute  the  orthogonal 
projection  of  a  function  f(x)  on  0. .  The  collection  of  wavelets  -  /)  \l  e  Z}  forms  a 
Riesz  basis  of  L2(R)  as  well.  The  wavelet  \|/  also  satisfies  the  refinement  relation 

=  2£g^(2jc-*),  (3.3) 
k 

where  we  refer  to  g  as  the  filter  associated  with  \|/ .  The  refinement  equation  is  equivalent 
to 

where  g( co)  =  ^ske~'ka>- 

k 

1.2  Wavelet  Rases 

We  described  in  the  previous  section  that  the  basis  functions  for  the  space  V  are 
called  scaling  functions.  We  defined  the  wavelet  spaces,  denoted  by  Wj ,  where  detail 
information  between  two  contiguous  spaces  is  projected  as  an  orthogonal  complement  of 
VJ  in  W  +  1 .  Thus,  any  function  in  V' +  1  can  be  written  as  the  sum  of  a  function  in  VJ  and 
a  function  in  W .  The  functions  we  choose  as  a  basis  for  W  are  called  wavelets.  In  this 
section,  we  review  general  properties  of  the  wavelet. 

Wavelet  transforms  compute  coefficients  that  are  inner  products  of  a  time-varying 
signal  x{t)  and  a  family  of  wavelets 

W)  =  lfll"1/2H^)'  (3-4) 
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where  \|/(r)  is  the  prototypical  analyzing  wavelet,  or  called  a  mother  wavelet,  a 
represents  a  scale  factor  in  time,  and  b  a  time  location.  The  factor  |a|_1/2  is  used  for 
energy  preservation.  The  admissibility  condition  is  required  on  for  it  to  be 

considered  as  an  analyzing  wavelet  [10],  which  is  given  as 

C    -flSfSEfco.,  (3.5) 

where  vj/(co)  is  the  Fourier  transform  of  \}/(r).  This  condition  ensures  that  the  wavelet 
transform  is  a  bounded  and  invertible  operator  in  L2(R) .  In  particular,  if  \\f(t)  must  be  a 
window  function,  then  y(t)  is  necessarily  in  L^R),  meaning  j\\\i(t)\dt<°°,  so  that 
vff(co)  is  differentiable.  Hence  from  (3.5)  it  must  vanish  at  the  origin:  that  is, 

\y(t)dt  =  0  .  (3.6) 

In  order  for  (3.6)  to  be  satisfied,  y(f)  must  be  oscillated.  Since  \j/(0)  =  0  and  the  power 
spectrum  |\j/(od)|  decays  at  high  frequencies,  admissible  wavelets  often  act  the  way  that 
band-pass  filters  do. 

There  is  an  outstanding  property  in  the  wavelet  basis:  time -frequency  localization 
with  varying  window  shape  due  to  different  time  and  frequency  intervals,  which  makes 
the  wavelet  and  wavelet  transform  more  viable  than  the  short-time  Fourier  transform 
(STFT)  which  uses  a  constant  time-frequency  window.  The  variation  of  resolution  in  the 
time-frequency  phase  space  enables  the  wavelet  transform  to  zoom  into  the  irregularities 
of  the  signal  and  characterize  them  locally.  To  explain  this  property,  let  x0  and  (d0  be 
finite  centers,  and  Ax  and  finite  radii  of  a  wavelet  \\f,  respectively.  Then  they  are 
defined  as 
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Xo  = 


jj"  x\y(x)\dx , 


and 


The  energy  of  the  continuous  wavelet  transform  defined  in  (3.13)  is  mostly  distributed  in  a 
time-frequency  window  defined  by  the  time  interval  [b  +  axa  -  aAx,  b  +  ax0  +  aAx]  and 
the  frequency  interval  ^[coo  -  Am,  coo  +  AJ  .  The  importance  of  the  time-frequency 
window  is  that  it  widens  for  small  center-frequency,  and  narrows  for  large  center- 
frequency  — .  This  is  shown  in  Figure  3.4.  However  the  area  of  the  time-frequency 
window  is  constant  and  given  by  4AxAa .  This  is  exactly  what  is  most  desirable  in  time- 
frequency  analysis.  The  constant  windowing  area  suggests  that  the  product  of  the  time 


co  n 


CO, 


CO. 


CO. 


b^+a^XQ     xo         b2  +  a2x0 


Figure  3-1.  Time-frequency  windows. 
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interval  and  the  frequency  interval  is  a  stable  quantity.  The  Heisenberg  Uncertainty 
Principle  sets  a  specific  lower  bound  for  the  product,  which  is  given  as 

A,Aro>l. 

From  the  admissibility  condition  given  in  (3.5)  and  \j/(0)  =  0,  the  frequency 
localization  of  wavelets  is  much  better  than  described  above.  Since  coo  is  zero  in  most 
cases,  the  frequency  interval  is  further  divided  into  a  subband  ^  { [-co2,  -co ,  ]     [  co2,  to ,  ] } 

There  are  three  categories  of  wavelet  bases:  orthogonal,  semiorthogonal  and 
biorthogonal.  An  orthogonal  basis  is  one  in  which  each  basis  function  is  orthogonal  to 
every  other  basis  function.  An  orthogonal  multiresolution  basis  is  one  in  which  the  scaling 
functions  are  orthogonal  to  one  another,  the  wavelets  are  orthogonal  to  one  another,  and 
each  of  the  wavelets  is  orthogonal  to  every  scaling  function.  Wavelets  that  satisfy  these 
conditions  are  called  orthogonal  wavelets.  These  conditions  can  be  equivalently  written 
for  every  j,  k  and  /  as 

MlvO  =  Kn  (3.7) 

where  j,k,le  Z.  The  Haar  wavelet  and  the  Daubechies  wavelets  are  the  orthogonal 
wavelets. 

Orthogonality  is  not  the  only  desirable  property  in  a  wavelet  basis.  There  are  some 
other  properties  that  may  be  more  important  when  constructing  a  wavelet  for  a  particular 
application:  compact  support,  symmetry,  smoothness  and  vanishing  moments.  A  more 
compact  support  improves  the  efficiency  of  decomposing  and  reconstructing  functions 
using  a  filter  bank  algorithm.  Smooth  functions  are  best  represented  with  smooth  bases. 
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However,  greater  smoothness  comes  at  the  expense  of  wider  supports.  Wavelets  with  more 
vanishing  moments  are  often  desirable  for  applications  that  require  numerical 
approximations  of  smooth  operators.  It  is  known  that  there  are  no  wavelet  bases  except  the 
Haar  wavelet  that  are  orthogonal,  compactly  supported  and  symmetric  at  the  same  time  [9] 
[10].  For  those  applications  that  need  to  use  smooth  symmetric  wavelets  with  compact 
supports,  orthogonality  may  be  sacrificed.  Wavelets  that  are  orthogonal  to  all  scaling 
functions,  not  necessarily  orthogonal  to  each  other  are  called  semiorthogonal  wavelets. 
Thus  the  only  condition  required  of  the  semiorthogonal  wavelets  is  (<]){  |  \|fj 1 )  =  0  for  all 
k  and  /.  Spline-based  wavelets  are  the  semiorthogonal  wavelets. 

Semiorthogonal  constructions  however  take  quadratic  time  in  analysis  although 
reconstruction  can  be  done  in  linear  time.  Cohen,  et  al.  [7]  developed  biorthogonal 
wavelets  that  although  not  orthogonal  to  scaling  functions,  still  have  many  of  the  desirable 
properties  of  the  semiorthogonal  wavelets.  Dual  basis  functions  are  central  to  the 
definition  of  a  biorthogonal  wavelet  construction.  Let  u(x)  be  a  set  of  orthonormal  bases 
such  that  u(x)  =  [«,(*)  u2(x)  ...].  Then  a  function  f(x)  can  be  represented  with 
coefficients  ci  in  the  span  of  u(x): 

i 

A  coefficient  Cj  can  be  determined  by  the  inner  product  of  the  function  with  Uj(x) : 

(f\uj)  =  £c,<  Uj  |  Uj) 

=  lcfiUJ  (3-9) 

i 

=  Cj- 

If  u(x)  is  not  an  orthonormal  basis,  then  <  /  |  Ui)±Cy  However  we  can  find  a  set 
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of  bases  u(x)  =  [ux(x)  u2(x)  ...]  that  computes  the  coefficient  cj 

cj  =(f\uj).  (3-10) 
Such  functions  u(x)  are  called  the  dual  basis.  By  taking  the  inner  product  of  both  sides  of 
(3.10),  we  see  that 

<«,|5,>- V  (3-lD 
A  biorthogonal  wavelet  basis  is  one  in  which  the  primal  scaling  functions  are 
orthogonal  to  the  dual  wavelets  and  the  primal  wavelets  are  orthogonal  to  the  dual  scaling 
functions.  In  mathematical  notation,  the  conditions  defining  a  biorthogonal  wavelet  basis 
can  be  written  for  every  j,  k  and  /  as 

<M\H)  -  hi 

MivO  =  hi  (312) 
<<Hl¥i>  =  o 
Ml#>  =  o. 

Deciding  whether  to  use  an  orthogonal,  semiorthogonal  or  biorthogonal  wavelets 
is  not  a  trivial  task,  since  each  has  its  own  advantages  and  disadvantages.  One  of 
advantages  of  orthogonal  wavelets  is  that  L2  norm  can  be  easily  computed  from  the 
coefficients.  Semiorthogonal  wavelets  are  useful  in  that  they  provide  compactly  supported 
and  symmetric  bases  while  maintaining  orthogonality  between  wavelets  and  scaling 
functions.  Biorthogonal  wavelets  are  the  most  flexible  ones  among  all.  Lifting  that  will  be 
discussed  in  detail  later  provides  a  flexible  tool  to  construct  the  biorthogonal  wavelets  by 
improving  such  properties  as  locality  of  support  and  degree  of  orthogonality. 
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3  3  Wavelet  Transforms 

Based  on  the  type  of  the  parameter  pair  (a,  b)  in  Equation  (3.4),  we  produce  a 
different  type  of  wavelet  transform:  continuous  wavelet  transform  (CWT),  wavelet  series 
transform  (WST)  and  discrete  wavelet  transform  (DWT).  In  the  CWT,  time  t  and  the  time- 
scale  parameter  pair  (a,  b)  vary  continuously,  which  is  given  as 

CWT{x(t),(a,b)}  =  \ya  b(t)x(t)dt ,  (3.13) 

where  \j/(f)  is  complex  conjugate  of  \|f(f),  and  b  e  R,  a  >  0 .  In  WST,  a  has  the  form 
a  =  2J  where  is  termed  the  octave  of  the  transform,  and  b  a  multiple  of  a :  that  is, 
b  =  nV .  Here,  j,ke  Z .  The  WST  is  thus  given  as 

WST{x(t),(2i,k2j)}  =  2-j/2^(2Jt-n)x(t)dt .  (3.14) 
In  DWT,  both  time  and  time-scale  parameters  are  discrete.  The  DWT  is  thus  given  as 

DWT{x[n],(2J,k2i)}  =  2-j'2^(2Jk-n)x[k]  .  (3.15) 

k 

The  sample  rate  has  been  set  to  one.  From  the  implementation  point  of  view,  the  DWT  is 
in  fact  the  same  as  an  octave-band  filter  bank  which  is  computationally  efficient.  As 
pointed  out  by  Rioul  and  Duhamel  [23],  if  a  wavelet  transform  is  the  DWT,  then  the 
implementation  is  thus  likely  to  be  efficient  and  fast.  The  arithmetic  complexity  of  various 
DWT  algorithms  will  be  shown  in  the  next  chapter. 

A  function  can  be  reconstructed  from  its  wavelet  coefficients.  With  the  constant 
Cv  satisfying  (3.5),  we  have  the  following  reconstruction  formula: 

f{x)  =  jr f  f  Wyf(a,  b)va>  b(a,  b)^f-  ,  (3. 16) 

where  W^f(a,  b)  is  a  set  of  wavelet  coefficients.  Since  the  CWT  can  only  be  computed  on 
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a  discrete  grid  of  points,  we  focus  on  use  of  the  DWT  for  recovery  of  the  function.  Since 
only  partial  information  of  W^f  is  available  in  the  DWT,  the  basis  wavelet  \\f  must  satisfy 
a  stronger  condition  than  the  admissibility  condition  (3.5)  for  a  reconstruction  formula  to 
be  available.  One  of  the  important  properties  to  be  considered  in  this  case  is  its 
completeness  or  stability.  In  order  to  establish  stable  representations,  the  basis  must  form  a 
Riesz  basis  satisfying 

A\\f\\2<Z\(f>Va,t)\2^BM2,  (3-17) 
where  A  and  B,  with  0<A<fl<°°,are  constants  independent  of  to ,  and  both  a  and  b 
are  dyadic  numbers.  The  wavelet  family  b\a,  be  Z}  constitutes  a  frame  if  it  satisfies 
(3.17).  If  A  =  B ,  then  the  frame  is  called  a  tight  frame.  Thus,  for  any  fe  L2{R) ,  we  have 

f=     X  <f>Va,b>Va,b> 
a,  be  Z 

where  \j/  is  a  dual  of  .  Here  if  A  =  B  =  1  and  \j/fl  b  =  \j/fl  fc ,  then  the  wavelet  family 
\ya  b  constitutes  an  orthonormal  basis. 


CHAPTER  4 

COMPUTATIONAL  COST  OF  DISCRETE  WAVELET  TRANSFORMS 

The  discrete  wavelet  transform  (DWT)  has  been  known  as  a  natural  wavelet 
transform  for  discrete-time  signals.  In  this  chapter,  we  study  computational  cost  of  various 
fast  algorithms  to  implement  the  DWT.  We  define  the  computational  cost  as  the  sum  of  the 
number  of  multiplications  and  the  additions. 

4  1  nirprt  Filtp.r  Rank  Algorithm 

Among  several  implementation  structures  for  the  DWT,  the  octave-band  filter 
bank  shown  in  Figure  4-1  has  been  recognized  as  a  general  and  regular  structure  where 
basic  computational  cells  for  the  forward  and  inverse  DWT  are  repeated  without  any 


DWT  Cell 


{Y.ljeZ} 

J  L 

Wavelet 
Coefficients 


IDWT  Cell 


Figure  4-1.  Basic  computational  cells  for  the  forward  DWT  and  the  inverse  DWT 
(IDWT). 
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modification  throughout  the  DWT  implementation.  Although  this  general  method  is 
already  efficient,  several  authors  [21]  [22]  [23]  have  shown  that  there  is  a  room  for 
noticeable  computational  savings  by  FFT-based  fast  convolution  techniques. 

Before  calculating  computational  cost  of  efficient  algorithms,  we  assume  for 
simplicity  that  signal  and  finite-length  filters  are  real.  This  assumption  applies  in  general, 
such  that  it  can  be  extended  to  the  computation  of  the  cost  for  the  complex-valued  case  for 
which  we  only  need  to  know  additional  multiplications  and  additions  required.  For 
straightforward  filter  bank  algorithms,  each  complex  multiplication  requires  four  real 
multiplications  and  two  real  additions,  and  each  complex  addition  requires  two  real 
additions. 

One  other  assumption  to  make  is  that  the  filters  involved  in  the  computation  of  the 
DWT  have  equal  length.  This  is  true  for  the  orthogonal  DWT,  while  in  the  biorthogonal 
case  the  filter  lengths  may  differ.  However  the  equal  length  restriction  can  hold  for  the 
biorthogonal  case  as  well  by  padding  shorter  filters  with  zeros  to  have  the  same  length  as 
the  longest  filter. 

An  algorithm  performs  faster  and  is  more  efficient  than  others  if  there  is  some 
reduction  of  computational  cost  for  the  algorithm.  Here  the  cost  is  usually  derived  based 
on  the  number  of  real  multiplications  and  real  additions  per  input  point  required  by  the 
algorithm.  With  today's  technology,  multiplications  are  no  longer  dominant  part  of  the 
computations  since  a  parallel  multiplier  is  built  in  the  modern  computers.  Thus  a  useful 
criterion  is  the  sum  of  the  number  of  multiplications  and  additions.  Although  it  is  not  the 
only  relevant  criterion,  the  computational  cost  is  fairly  instructive  way  for  comparing 
various  DWT  algorithms. 
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The  DWT  implementation  on  the  filter  bank  is  known  fast.  An  intuitive  reason  is 
that  the  DWT  uses  the  decomposition  of  the  computation  into  basic  computational  cells 
and  the  decimations  within  the  cell.  Let  us  suppose  that  the  filters,  h[n]  and  g[n] ,  have 
equal  length  L.  Then  a  point  in  the  DWT  cell  requires  2L  multiplications  and  2 (L  -  1) 
additions  for  every  set  of  two  inputs.  If  we  let  Mfrect  and  Afrect  be  the  number  of  real 
multiplications  and  additions  per  input  point  for  each  computational  cell  at  the  ;  th  octave, 
respectively,  then  for  ;  =  1,2,...,/ 

Mdirect  _  i 

1  (4.1) 

Adirect  =  L-l  . 

That  is,  the  number  of  operations  per  input  point  for  each  computational  elementary  cell 
is  given  by 

nOPsdirect  =  (Le(L-l)),  (4.2) 
where  nOPsdirect  stands  for  the  number  of  operations  involved  in  the  direct  filter  bank, 
and  (m  ©  a)  denotes  m  multiplications  and  a  additions  per  a  point  for  an  elementary 
cell. 

The  input  to  the  DWT  cell  at  the  ;'th  octave  is  downsampled  by  2>~ 1 .  Thus  the 
total  number  of  input  points  involved  in  the  DWT  on  /  octaves  is  2N{  1  -  2~J) ,  where  N 
is  the  size  of  the  original  input.  If  we  denote  by  Cdirecl  the  total  cost  of  the  standard  DWT 
algorithm  on  J  octaves  implemented  directly  as  a  filter  bank,  then 

C  direct  =  2N{\-2~J){L®L-\).  (4.3) 
As  the  number  of  octaves,  J ,  increases,  Cdirect  is  bounded  to  2N(L  ©  (L  -  1 )) .  Since  the 
filter  length  is  the  major  factor  in  comparing  DWT  algorithms,  we  consider  W  as  a 
constant.  Thus 
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0(Cdirect)~L.  (4.4) 
The  DWT  (3.15)  can  be  written  in  terms  of  an  associated  filter  gj[n]  by 

DWT{x[n],  (21,  *20>  =  ^x[n)gj[n  -  2>k]  .  (4.5) 

n 

In  the  naive  computation  of  DWT  (4.5),  the  length  of  the  filter  gj[n]  at  the  yth  octave  is 
(L-l)(2/'-l)  +  l.Thus 


Mdirect  ■  (I-L)(2/-l)+l  , 
Adirect  -  (L-  \  )(2>  -  1)  . 


(4.6) 


Let  C„oive  be  the  total  cost  of  the  naive  DWT  algorithm  on  J  octaves.  Then 

Cnaive  =  (WW  -  1)  +  O  ©  (JN(L-  1)))  .  (4.7) 

Thus 

W^WL.  (4-8) 
The  complexity  (4.8)  does  not  bound  as  the  J  increases.  Based  on  the  complexities 
given  by  (4.4)  and  (4.8)  as  big  O  notations,  the  filter  bank  algorithm  implementing  the 
DWT  achieves  a  speedup  by  a  factor  of  J  over  the  naive  computation  of  the  DWT 
algorithm  by  reducing  the  complexity  from  JL  to  L . 

4  ?  EELha&ed  nwT  Algorithm 

Although  the  filter  bank  algorithm  described  in  the  previous  section  already 
achieves  a  fast  implementation,  FFT-based  algorithms  lead  to  further  reduction  of  the 
computational  complexity  of  the  DWT  from  L  to  logL .  In  this  section  we  introduce  the 
split-radix  FFT  algorithm  [22]  which,  among  all  practical  FFT  algorithms,  has  the  best 
known  complexity  for  lengths  N  =  2n  [23].  Before  going  into  more  details  on  the 
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algorithm,  we  need  to  reorganize  the  filter  bank  structure  shown  in  Figure  4-1  to  be  more 
efficient.  Since  each  computational  cell  contains  two  filters  followed  by  subsampling,  a 
polyphase  representation  is  better  in  terms  of  efficiency.  Filtering  in  the  polyphase 
implementation  structure  is  applied  to  each  of  the  even  and  odd  subsampled  sequence 
separately.  This  already  reduces  the  computation  by  a  factor  of  two. 
The  polyphase  representation  of  x  is  given  by 

X(z)  =  Xe(z2)  +  z-lX0(z2),  (4-9) 
where  Xe  and  Xa  contain  even  and  odd  sequence,  respectively.  In  other  words, 

Xe(z)  =  Y,x2kz~k  , 

k  (4.10) 

k 

Let  us  denote  by  A(z)  the  cell  output  that  enters  the  next  stage,  and  by  B(z)  the  cell 
output  that  computes  wavelet  coefficients  at  the  current  stage.  A(z)  (respectively,  B(z) )  is 
obtained  by  filtering  by  H(z)  (respectively,  G(z) )  followed  by  subsampling,  that  is 

A(z)  =  He(z)Xe(z)  +  z-lG0(z)X0(z)  , 
B(z)  =  Ge(z)Xe(z)  +  z-lH0(z)X0(z)  . 

Figure  4-2  shows  the  polyphase  structure  for  the  forward  DWT  based  on  (4.11)  that 
reorganizes  the  DWT  cell  shown  in  Figure  4-1. 

Many  FFT  algorithms  based  on  decimation-in-time  and  decimation-in-frequency 
use  sequences  whose  length  is  power  of  two.  There  have  been  FFT  algorithms  that  make 
the  discrete  Fourier  transform  efficient  if  N  can  be  represented  as  a  product  of  factors.  The 
radix-R  FFT  algorithm  is  one  of  them  that  uses  a  product  of  an  identical  factor  (i.e., 
N  =  Rv).  The  radix-2  and  radix-4  algorithms  are  the  ones  that  have  been  mostly  used  for 
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Figure  4-2.  Forward  polyphase  representation  reorganized  to  allow  fast  filtering 
algorithms. 


practical  applications.  The  split-radix  FFT  algorithm  [22]  combines  them  in  such  a  way 
that  it  requires  the  lowest  number  of  multiplications  and  additions  and  keeps  the  same 
regularity  as  the  radix-4  algorithm  and  the  same  flexibility  as  the  radix-2  algorithm.  It  uses 
a  fact  that  a  radix-4  is  better  for  odd  terms  of  the  DFT  and  a  radix-2  for  the  even  terms  of 
the  DFT.  If  we  define  the  complex  quantity  in  the  DFT  as  WN  =  e~/(2*/JV) ,  the  split-radix 
is  decomposed  into 

N/2-i 

X2k  =     X    {*n  +  Xn  +  (N/2))WNnk  ' 
n  =  0 
N/4-1 

X4k+\  ~      X    [(xn-Xn  +  (N/2))-j(Xn  +  (N/4)-Xn  +  (3N/4)NWNWNnk  '  (4'12) 
n  =  0 
N/4-l 

X4k  +  3  =      X    [(Xn~Xn  +  (N/2))+j(Xn  +  (N/4)-Xn  +  (3N/4))]WN"WNnk- 
n  =  0 

The  split-radix  decimation-in-frequency  decomposition  replaces  a  DFT  of  length  N  by  one 
DFT  of  length  N/2  and  two  DFT's  of  length  4/N.  Let  RMnFFT  and  RAnFFT  be  the 
number  of  real  multiplications  and  additions,  respectively,  needed  to  perform  a  2" -real 


30 


DFT  with  the  split-radix  algorithm.  Then  the  split-radix  FFT  for  real  data  requires  [22] 
[23] 


For  complex  data, 


^fft  =  2"-i(„-3)  +  2, 
^fft  =  2"-!(3n-5)  +  4 


^fft  =  2«(n-3)  +  4, 
^fft  =  32»(n-l)  +  4, 


(4.13) 


(4.14) 


where  M*/7  and  (A™7  are  the  number  of  complex  multiplications  and  additions. 

In  order  to  compute  the  computational  cost  of  FFT-based  DWT  algorithms,  Figure 

4-3  shows  an  alternative  implementation  structure  of  the  DWT  cell  depicted  in  Figure  4-2. 

N   

The  input  of  size  N  is  split  into  even  and  odd  sequences,  and  a  length  -  FFT  is  applied  to 
each  sequence  separately.  The  four  convolutions  with  four  filters  in  frequency  domain  are 
followed.  This  requires  four  —  complex  multiplications.  Two  blocks  are  then  added  which 
requires  two  -  complex  additions,  and  finally  two  ^  inverse  FFT's  are  applied.  A 
complex  multiplication  requires  four  real  multiplications  and  two  real  additions.  The  total 
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Figure  4-3.  FFT-based  implementation  structure  of  the  DWT  cell  shown  in  Figure  4-2. 
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number  of  operations  for  an  input  of  size  N  involved  in  the  FTT-based  implementation 
structure  shown  in  Figure  4-3  are  thus  given  as 

2  •  FFTN/2  +  4  •  {Amults  +  ladds)  ■  |  +  2  •  2  •  ^adds  +  2  ■  IFFTN/2  .  (4.15) 

Here  we  assume  that  the  FFT's  of  the  four  filters  are  precomputed.  For  general  purpose, 
we  also  assume  that  all  inputs  for  the  FFT  and  IFFT  are  complex,  so  that  we  use  the  cost  on 
complex  data  (4.14).  The  total  number  of  operations  per  input  point  for  each  basic 
computational  cell  is  thus  given  as 

f(2"+1Wn+16)  ©  (3-2"  +  '(n-l)+16^  _  ^ 

Since  L  has  the  same  order  of  magnitude  as  N,  n  =  logL.  If  we  substitute  every  n  in 

(4.16)  with  logL,  the  number  of  operations  per  input  point  in  the  FFT-based 

computational  cell  nOPsFFT  is  given  by 

//2Llog2L+  16\     /6L(log2L-l)+  16\\ 
n0PsFFT  =   0   2-T   •  (4- 17) 


Table  4-1  shows  the  resulting  computational  costs  for  different  lengths  L  in 
comparison  with  the  direct  method  (4.2)  and  with  the  FFT-based  algorithm  (4.17).  The 
comparison  tells  us  that  the  FFT  version  of  the  DWT  algorithm  is  efficient  for  filters  of 
length  greater  than  or  equal  to  12  (L>  12).  Figure  4-4  shows  graphically  the  costs  of 
nOPsdirect  and  nOPsFFT  based  on  the  results  in  Table  4-1. 

A  3  Short-length  FTR  Filtering 

Since  the  FFT-based  algorithms  are  no  longer  efficient  for  short  filters,  it  is 
appropriate  to  apply  a  specific  class  of  fast  algorithms  to  the  computation  of  the  DWT  for 
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i  j 
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ro  A  111  -  21 

18 

ns  (ft  Hi  —  is 

fQ23  A  19911  =  29  14 

(2x3x3) 

(8©  17)  =  25 

20 

(20©  19)  =  39 

(9.44  ©20.73)  =  30.18 

(5x2x2) 

(7.2  ©21.4)  =  28.6 

22 

(22  ©21)  =  43 

(9.65  ©21.48)  =  31.13 

24 

(24  ©23)  =  47 

(9.84  ©22.18)  =  32.01 

(2x2x3x2) 

(12©  18)  =  30 

26 

(26  ©25)  =  51 

(10.02  ©22.82)  =  32.83 

28 

(28  ©27)  =  55 

(10.19  ©23.42)  =  33.6 

30 

(30  ©29)  =  59 

(10.35  ©23.97)  =  34.32 

(5x3x2) 

(9.6  ©27)  =  36.6 

32 

(32  ©31)  =  63 

(24.5  ©24.5)  =  35 

(2x2x2x2x2) 

(18  e  22)  =  40 

Table  4-1.  Computational  costs  per  point  and  per  elementary  cell  for  the  direct  filter 
bank  algorithm  (4.2),  the  FFT-based  algorithm  (4.17)  and  the  fast  running  short-length 
FIR  filtering  algorithm  (4.19). 
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Figure  4-4.  Comparison  of  computational  cost  of  the  direct  filter  bank  method,  the  FFT 
version  of  the  DWT  algorithm,  and  the  fast  running  short-length  FIR  filtering. 


short  filters.  Although  the  resulting  gain  is  fairly  modest,  it  is  still  interesting  when  heavy 
computation  of  short  filters  is  required  [23].  Fast  running  short-length  FIR  algorithms  [21] 
are  the  ones  among  the  class.  A  general  description  of  the  algorithm  follows:  medium  or 
long  filters  are  decimated  into  short-length  FIR  filters  with  reduced  computational  cost 
where  all  multiplications  are  replaced  by  decimated  subfilters.  Since  the  process  can  be 
repeated  on  the  subfilters,  the  short  filters  become  basic  elements  in  deriving  these  fast 
algorithms.  One  good  aspect  of  the  fast  running  FIR  algorithms  is  that  they  allow  various 
trade-offs  between  structural  regularity  and  arithmetic  efficiency.  Figure  4-5  [23]  shows  a 
simple  example  of  the  short-length  FIR  filtering  algorithm  with  the  decimation  ratio  is 
two. 
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Figure  4-5.  An  example  of  fast  running  FIR  filtering  algorithm  for  short-filters.  Here 
decimation  ratio  is  two. 


Let  R  be  a  decimation  ratio  and  L  the  length  of  a  filter  where  L  =  kR,  and  k  is  a 
positive  integer.  We  denote  by  F(R,  L)  an  algorithm  computing  R  outputs  of  a  length-L 
FIR  filter.  We  denote  by  MsRhoLrt  and  AsRhoLrt  the  number  of  multiplications  and  additions, 
respectively,  of  the  F(R,  L)  per  output  of  a  length-L  filter.  L  can  be  factored  by  various 
7?'s  such  that 


L  =  R]R2---Ri---RrLr , 
where  r  is  a  positive  integer.  Then  MsRhoLrt  and  AsRhoLrt  can  be  written  as  [21] 


(4.18) 


Mskort  =  MdirectJ^Mfhor, , 


i=  1 
r  i- 


ARhTi  =  x  AR*°rt  x  mr°r  +  (Mfirect  - 1 )  n  mrx  - 


(4.19) 


where  Mfrect  denotes  the  cost  of  the  direct  filter  bank  algorithm  for  length-r  filters  (4.1). 
Mou  and  Duhamel  [21]  developed  the  short-length  FIR  filtering  structures  to  implement 
algorithms  of  F(2,  2) ,  F(3,  3)  and  F(5,  5)  that  are  most  useful  short-length  filters 
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without  any  expense  of  structural  regularity.  Their  computational  costs  are  given  by 

(M^f'SA^f")  =  (1.502)  , 

(Mf*f 0  Afjf*)  =  (2  0  3.3)  ,  (4.20) 
(M^f  ©A^f ')  =  (2.4  8  8)  . 

Table  4-1  lists  the  resulting  costs  of  the  short-length  FIR  filtering  algorithm.  The 
result  shows  that  the  short-length  FIR  algorithm  is  more  efficient  than  the  FFT-based 
DWT  algorithms  for  lengths  up  to  L  =  30 .  Rioul  and  Duhamel  [23]  pointed  out  that  the 
short-length  FIR  algorithms  could  save  computational  cost  up  to  30%  for  short-length 
filters.  Since  practical  applications  generally  use  short  filters  to  compute  the  DWT's,  such 
a  gain  can  make  a  huge  difference  in  the  performance  of  the  DWT's  when  heavy 
computation  is  involved. 


CHAPTER  5 
MULTISCALE  BLOCK  ALGORITHM 

In  this  chapter  we  describe  the  design  of  a  high-speed  algorithm  to  support 
interactive  analysis  via  multiscale  processing.  In  addition  to  writing  multi-threaded  code, 
based  on  kernel  support  for  multiple  threads  of  control  and  changing  the  scheduling  class 
policy  to  real-time  mode,  our  algorithm  uses  hardware  and  software  support  for  high- 
speed image  manipulation,  and  achieves  the  best  performance  from  the  combination  of  all. 
This  algorithm  is  utilized  later  to  provide  the  best  performance  in  computing  the  discrete 
wavelet  transform  for  overcomplete  representations. 

We  first  review  the  architecture  of  the  SPARCstation  SX  system  which  we  have 
used  as  a  base  platform  in  developing  most  of  our  algorithms.  And  we  describe  briefly 
XEL  foundation  graphics  interface  and  relevant  functions  accelerated  on  the  SX  system 
processor.  We  then  introduce  a  new  algorithm:  multiscale  block  algorithm,  to  accomplish 
the  forward  and  inverse  discrete  dyadic  wavelet  transforms  that  exploits  the  parallel 
architecture  of  the  machine.  We  show  how  this  multiscale  block  algorithm  works  and 
demonstrate  the  amount  of  speedup  achieved  over  traditional  linear  convolution  schemes 
and  conventional  FFT  methods. 

S.I  Architectural  Overview  of  SPARCstation  SX  System 

The  SPARCstation  SX  graphics  processor  is  a  scalable  graphics  architecture 
consisting  of  an  enhancement  to  memory  controller  chip  on  an  MBus  within  a 
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SPARCstation.  The  SPARCstation  SX  system  relies  on  the  notion  that  graphics  operations 
become  extensions  of  native  integer  and  floating-point  operations  of  the  CPU.  The 
SPARCstation  SX  system  employs  a  closely-coupled  architecture,  wherein  a  dedicated 
processor  handles  low-level  display  operations  through  processor-memory  direct  access. 
Residing  directly  on  the  CPU-memory  bus,  the  SPARCstation  SX  memory  controller, 
called  SMC,  can  perform  extremely  fast  transfer  of  data  from  physical  memory  (DRAM) 
to  the  display  frame  buffer.  The  close-coupling  of  the  SMC  to  the  SPARCstation  CPU 
further  enables  floating  point-based  computation  to  utilize  the  high-performance 
transformation  capabilities  on  the  SPARC  processor.  Figure  5-1  illustrates  the  high-level 
architecture  of  the  SPARCstation  MBus  system,  including  CPU,  SMC  processor,  and 
video  memory. 

The  SMC  processor  lies  on  the  MBus  between  the  SPARC  CPUs  and  memory.  In 
addition,  Video  RAM  (VRAM)  is  mapped  into  the  same  address  space,  allowing  the  SMC 
to  access  and  manipulate  the  display  in  the  same  manner  as  DRAM.  This  addressing  gives 
the  SPARCstation  SX  system  superior  bandwidth  in  moving  large  size  images,  for 
example,  mammograms,  from  memory  to  the  screen.  Instructions  executing  in  one  or 
more  of  the  SPARC  CPUs  make  requests  of  the  memory  controller  (MC)  portion  of  the 
processor.  The  memory  controller  logic  fulfills  CPU  requests  for  memory  accesses,  such 
as  those  required  for  fetching,  loading  and  storing  instructions. 

When  one  of  the  CPUs  encounters  a  graphics  operation,  it  passes  it  to  the  SMC 
pixel  processor  (SXPP),  which  executes  the  instruction  directly.  When  memory  access  is 
required  to  complete  an  operation,  the  SXPP  makes  a  request  to  the  memory  controller. 
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Figure  5-1.  SPARCstation  SX  system  high-level  architecture  [26]. 


12  XTT,  Imaging  Library 

The  XIL  library  from  Sun  Microsystems  is  a  foundation-level  imaging  interface 
providing  common  functions  required  by  most  imaging  applications.  Such  a  library  is  an 
application  programming  interface  (API),  but  it  is  different  from  normal  APIs  because  it  is 
hardware  dependent.  The  XIL  library  contains  three  primary  components:  a  programming 
interface  specification  for  basic  imaging  functionality,  a  high-performance 
implementation  of  the  specification,  and  a  standard  hardware  interface  specification, 


39 


Applications 


• 

_i 

CL 

|  XIL 

1  Fax 

|  Video 

|  PHIGS 

1  PEXlib 

|   IRIS  G 

|  XGL 

Xlib 

PostScr 

XIL 

XGL 



X  Windows 

ImagingA/ideo 

2D/3D  Geometry 

SPARCstations  20SX 


Figure  5-2.  Solaris  foundation  graphics  libraries  and  layered  interface. 


which  enables  third-party  hardware  developers  to  readily  support  the  XIL  library  and 
applications  built  on  it  [28].  Figure  5-2  illustrates  the  XIL  library  in  relationship  to  other 
foundation-level  interfaces. 

Monadic  and  dyadic  operations  are  of  particular  interest  within  XIL  library.  Every 
XIL  operator  can  be  considered  an  atom.  Groups  of  XIL  operators  concatenated  in  a 
sequential  manner  are  known  as  molecules  which  perform  the  tasks  of  two  or  more  atomic 
functions.  Molecules  are  important  to  the  SPARCstation  SX  SMC  processor  because  they 
allow  the  device  to  notify  the  XIL  library  at  runtime,  and  obtain  hardware  support  for 
groups  of  chained  operations.  At  runtime,  the  SMC  device  handler  loads  into  memory  and 
notifies  the  XIL  library  of  the  atoms  and  molecules  it  wishes  to  replace  with  its  own 
equivalent  mapping  to  hardware.  The  XIL  library  uses  a  directed  acyclic  graph  to  hold  a 
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list  of  these  dependencies  during  execution.  When  the  XIL  library  encounters  a  known 
sequence  of  operations,  it  transfers  the  operation  to  the  SMC  for  execution.  XIL 
applications  can  realize  significant  performance  improvement  from  the  deferred 
execution. 

gj  Mnlriscale  Blndc  Algorithm 

In  frame-based  multi-scale  representations,  spatial  convolution  require  2l  -  1 
zeros  between  each  non-zero  filter  coefficients  in  a  filter  kernel  as  scale  changes.  This 
increases  the  computational  cost  exponentially.  However,  there  exists  an  algorithm  that 
reduces  the  cost  by  a  factor  of  2l .  The  basic  idea  behind  this  algorithm  is  to  keep  the  size 
of  the  filter  kernels  unchanged  throughout  multiscale  analysis.  Instead,  we  partition  the 
input  image  into  four  sub-regions  so  that  each  region  contains  only  those  pixels  affected 
by  the  filter  kernel  (without  expansion).  Figure  5-3  illustrates  differences  between  the  two 
approaches:  a  linear  convolution  with  zero-paddings  in  the  kernel,  and  an  alternative 
method  by  splitting  the  convolved  image  into  four  sub-regions  for  further  convolution  at 
finer  levels.  The  former  is  shown  in  Figure  5-3(a).  At  level  2,  zeroes  are  inserted  between 
the  non-zero  filter  coefficients.  Figure  5-3(b)  graphically  illustrates  the  new  method.  First, 
the  input  image  is  convolved  with  the  original  filter  kernel  at  level  1.  The  convolved 
image  is  then  divided  into  four  sub-regions  such  that  each  region  consists  of  only  those 
pixels  processed  by  the  convolution  at  the  next  level  (/  =  2).  After  this  splitting, 
convolutions  are  then  applied  to  each  of  the  sub-regions  independently.  To  visually 
describe  this,  we  show  within  the  input  image  matrix  four  distinct  pixel  shapes:  dot, 
triangle,  square,  and  cross,  in  such  a  way  that  pixels  that  belong  to  the  same  group  are 
identified. 


4! 


1=1  1  =  2 


(a)  Traditional  linear  convolution  at  scale  /  =  1,1  =  2.  One  zero  is  padded  between  non-zero 
filter  coefficients  in  the  filter  kernel  at  scale  2. 
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(b)  Alternative  approach  that  avoids  zero-paddings  between  the  non-zero  filter  coefficients.  We 
first  convolve  the  input  image  with  the  original  filter  kernel  at  the  level  1.  The  convolved  image  is 
then  divided  into  four  sub-regions,  marked  in  the  background  as  /,  //,  ///,  and  IV  with  half-grayed 
tone,  each  of  which  holds  pixels  that  are  precisely  affected  by  the  filter  kernel  coefficients  at  the 
next  level  /  =  2.  Four  distinct  pixel  shapes  inside  the  input  image  are  shown:  bullet,  triangle,  Box 
and  cross.  Only  pixels  that  have  the  same  shape  are  processed  by  convolution  at  the  next  level. 


Figure  5-3.  Difference  between  a  linear  convolution  with  zero-paddings  in  the  kernel  (a)  and  an 
alternative  method  with  four  sub-regions  (b). 
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Figure  5-4.  Multiresolution  block  convolution  for  image  synthesis.  Four  sub- 
regions  in  the  input  image  at  a  coarser  level,  marked  as  {/,  //,  ///},  and  {IV},  are 
merged  into  a  larger  region  at  a  finer  level.  Then,  convolution  is  applied  to  the 
merged  region. 


For  example,  let  us  compute  the  value  located  at  coordinate  (2,  2)  in  the  input 
image  shown  in  Figure  5-3(a)  marked  by  a  dot.  All  pixels  within  the  rectangular  area  from 
(0,  0)  to  (4, 4)  will  participate  in  the  computation.  However,  with  respect  to  traditional 
convolution  the  zeros  in  the  filter  kernel  at  level  2  effectively  mask  out  all  other  pixels 
having  shapes  other  than  a  dot.  Thus  pixels  that  actually  take  part  in  the  computation  are 
those  having  a  dot  shape  located  at  coordinates  (0, 0) ,  (0, 2) ,  (0, 4) ,  (2, 0) ,  (2, 2) , 
(2, 4) ,  (4, 0) ,  (4,  2)  and  (4,  4) .  The  region  labeled  /  shown  in  Figure  5-3(b)  at  the  level 
2  contains  exactly  those  pixels.  The  convolution  is  carried  out  separately  in  this  region  and 
gives  us  the  same  result  numerically  as  the  traditional  method. 

Synthesis  is  simply  the  reverse  of  analysis.  In  other  words,  we  first  merge  four  sub- 
regions  at  a  coarser  level  into  a  larger  region,  then  convolve  it  with  a  filter  kernel.  Figure 
5-4  shows  the  synthesis  process. 

For  quantitative  comparison,  let  us  compute  the  computational  cost  of  each 
method  of  implementation.  For  simplicity,  we  focus  on  the  analysis  only.  Let  the  size  of 
the  input  image  and  the  size  of  the  original  filter  kernel  be  n  and  m,  respectively.  And  let 
us  denote  the  level  of  analysis  by  /.  Then,  in  the  traditional  linear  convolution  case,  the 
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computational  cost  CA  is 

CA  =  n2-(m  +  2l-2)2 , 

and  in  our  approach,  the  computational  cost  CB  is 

^        2  y-i  2 
CB  =  n  -2      ■  m  . 

Normally,  the  size  of  the  filter  kernel  is  small,  so  we  can  assume  m  is  a  constant.  The 
speedup  by  the  new  approach  is  then 

=  0{2l). 

Algorithm  1  computes  the  two-dimensional  discrete  wavelet  transform  by  this 
more  efficient  method.  The  variables,  n  and  m,  denote  the  size  of  the  input  image  and  the 
size  of  the  filter  kernel,  respectively.  [Filter]x  Y  is  created  by  the  tensor  product  of  two 
ID  filter  kernels:  X,  Y.  Thus, 

[Filter]XY  =  XT. 

[X]ych  defines  a  ROI  of  size  wxh  starting  at  [rxw,  cxh].  There  are  three  global 
variables  in  the  algorithm:  LEVEL,  WIDTH,  HEIGHT.  By  LEVEL,  we  denote  the  number 
of  levels  for  the  transform,  and  by  WIDTH  and  HEIGHT  the  width  and  height  of  the  input 
image,  respectively. 

Algorithm  2  describes  our  algorithm  for  computing  two-dimensional  inverse 
discrete  dyadic  wavelet  transform  using  the  method  shown  in  Figure  5-4.  [Filter]  denote 
the  complex  conjugate  of  [Filter] . 

We  show  visually  in  Figure  5-5  the  process  of  a  two  dimensional  forward  wavelet 
transform  using  the  method  described  in  Algorithm  1.  In  Figure  5-5(b),  we  display  a 


Algorithm  1  2-D  discrete  dyadic  wavelet  transform  using  block  splitting  method 


Require  n:  dyadic  number  (n  >  m) 

for  /  =  0  to  /  <  LEVEL  do 

WIDTH  ,  HEIGHT 
w  =   —,h  =  

2'  2 

for  r  =  0  to  r  <  2*  do 

for  c  =  0  to  c  <  2l  do 

Obtain  a  region  of  interest  from  the  input  image  with 

[ROI]w'h<t= 

r'c      size  of  wxh  starting  at  [rxw,cxh] 

[Wh]w' *<t=  Split  ([ROI]»'ch  *  [Filter]L  G) 

L    f"t  C 

[Wi,]»'ch<=  Split  ([ROI]^ch  *  [Filter]C  L) 

[ROI]?<ch<=  Split  {[ROI]^ch  *  [Filter]H  H) 

end  for 
end  for 
end  for 


Algorithm  2  2-D  discrete  inverse  dyadic  wavelet  transform  by  block  merging  sub- 
regions 


Require  n:  dyadic  number  (n  >  m) 

for  /  =  LEVEL-  1  downto  /  =  0  do 

WIDTH  ,  HEIGHT 
w  =   —,h  =   -  

2  2 
for  r  =  0  to  r<21  do 
for  c  =  0  to  c  <  2l  do 


[ROI]?-ch  <=  Merge  sub-regions  of  ([ROI]^ch  *  [Filter]H<H) 
[ W ',] w> h  <=  Merge  sub-regions  of  ([Wh]w'h  *  [Filter]L,G) 
[ W 2, ]"'• *  <=  Merge  sub-regions  of  ([W*,]w'h  *  [Filter]GtL) 

2   r,c  2  r,c 

[ROI]?:ch  <=  [ROI]%ch  +  [W*,]»'ch  +  [Wj]** 


end  for 
end  for 
end  for 
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Figure  5-5.  (a)  Selected  ROI  containing  an  ill-defined  mass,  (b)  DC  components  at 
levels  /  =  1,  /  =  2  and  /  =  3  after  applying  Algorithm  1.  (c)  Processed  ROI  with 
suspicious  area  enhanced. 
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level 

Multiscale  Block 

Traditional  convolution 

FFT  approach 

1 

1.18 

0.83 

13.75 

2 

1.81 

2.87 

27.46 

3 

3.71 

7.87 

41.05 

4 

6.29 

26.01 

54.65 

5 

13.03 

93.93 

68.62 

Table  5-1.  Computational  costs  in  time  (seconds)  of  splitting  (in  decomposition)  and 
merging  (in  reconstruction)  schemes  versus  traditional  linear  convolution  with  zero- 
padding  between  each  non-zero  filter  coefficients  and  FFT  implementations. 


series  of  DC  components  at  levels  /  =  1 ,  /  =  2  and  /  =  3 .  Finally,  Figure  5-5(c)  shows 
an  enhanced  version  of  the  selected  ROI  shown  in  Figure  5-5(a). 

We  show  in  Table  5-1  the  computational  costs  in  time  (second)  for  enhancement 
between  methods  of  convolution  by  zero-padding  between  non-zero  filter  coefficients, 
forward  and  inverse  FFT,  and  our  new  multiscale  approach  of  splitting  and  merging  sub- 
regions.  Our  sample  input  ROI  was  256  x  256  and  was  computed  up  to  5  levels  of 
analysis.  The  size  of  the  original  kernels  used  for  the  decomposition  was  3x3  for  both 
horizontal  and  vertical  directional  filters.  In  the  reconstruction  case,  the  kernels  were 
3  x  5  in  the  horizontal  direction  and  5  x  3  in  the  vertical  direction.  Figure  5-6  shows 
graphically  comparison  of  the  costs  between  the  multiscale  block  algorithm,  traditional 
convolution  and  FFT-base  approach. 

The  costs  shown  in  the  Table  5-1  were  measured  on  a  SPARCstation  20  Model  71 
from  Sun  Microsystems  having  a  75MHz  SuperSPARC-II  processor  with  1MB 
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Figure  5-6.  Comparison  of  computational  costs  between  multiscale  block, 
traditional  convolution  and  FFT-base  approach. 


SuperCache,  96MB  RAM,  and  4MB  (60ns)  SX  frame  buffer.  This  table  clearly  shows  that 
our  approach  is  more  efficient  than  existing  convolution  schemes  and  exceeded  an  FFT  by 
a  factor  of  5  for  five  levels  of  processing. 


CHAPTER  6 
LIFTING  SCHEME 

In  this  chapter  we  review  the  basic  idea  behind  the  lifting  scheme  introduced  by 
Wim  Sweldens  [29]  [30]  for  constructing  compactly  supported  biorthogonal  wavelets.  We 
first  study  in  Section  6. 1  the  properties  of  second  generation  wavelets  which  emerged 
from  the  need  of  more  general  basis  functions  in  applications  on  general  domains.  In 
Section  6.2,  we  review  the  biorthogonal  wavelets  and  duals  to  see  how  they  are  related 
each  other  to  satisfy  a  perfect  reconstruction  condition  in  a  filter  bank  coding  system.  In 
Section  6.3  we  describe  the  mathematical  background  of  lifting  scheme  and  its  definition. 
We  then  discuss  in  Section  6.4  the  relation  between  interpolating  scaling  functions, 
associated  wavelets  and  the  lifting  scheme.  Finally  in  Section  6.5  we  show  how  this  lifting 
scheme  can  be  generalized  to  construct  any  wavelet  or  wavelet  transform  with  a  finite 
number  of  lifting  steps  [11]. 

6  1  Second  Generation  Wavelets 

Wavelets  \|T  k  are  traditionally  defined  by  translates  and  dilates  of  one  particular 
function,  a  mother  wavelet  \|/,  such  that  y^x)  =  yidx-k).  These  wavelets  are 
referred  to  as  first  generation  wavelets.  The  properties  of  first  generation  wavelets  are  the 
following:  They  form  a  Riesz  basis  for  L2(/?),  so  that  a  function /in  L2(R)  can  be 
represented  with  the  wavelet  basis  {YyjJ  as 
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f=  Tyj.M.k 

where  the  coefficients  y,,  k  are  computed  by  y,  k  =  (f,  \jfjt  k) .  vj>  is  a  dual  wavelet  of  i[f . 
The  wavelets  and  their  duals  are  well  localized  in  space  and  frequency.  In  other  words,  the 
wavelets  and  duals  have  compact  supports,  which  are  smooth  (i.e.,  they  decay  towards 
high  frequencies),  and  have  vanishing  polynomial  moments,  which  decay  towards  low 
frequency.  As  discussed  in  the  previous  chapter,  wavelets  are  also  utilized  into  a 
framework  of  multiresolution  analysis.  Finally,  they  mostly  rely  on  the  Fourier  transform 
as  a  basic  instrumental  tool  since  the  operations  of  the  wavelet  family  defined  by 
translates  and  dilates  become  algebraic  operations  in  Fourier  space. 

On  top  of  the  properties  of  the  traditional  wavelets,  second  generation  wavelets 
have  emerged  from  the  need  of  more  generalized  analyzing  functions  which  are  not 
necessarily  translates  and  dilates  of  one  basis  function.  Some  applications  on  general 
domains  require  wavelets  that  are  defined  on  arbitrary,  possibly  non-smooth,  domains  of 
Rn.  A  special  case  is  the  construction  of  wavelets  on  bounded  domains  without 
introducing  end  effects  near  boundaries.  Some  special  wavelets  are  also  required  in 
analyzing  data  on  curves,  surfaces  or  manifolds  independent  of  parametrization.  There  are 
also  problems  in  practical  applications  where  analysis  of  irregularly  sampled  data  may  be 
necessary,  which  can  not  be  achieved  by  first  generation  wavelets  and  traditional  wavelet 
transform  algorithms. 

Because  of  the  non-translation  invariance  property  of  its  basis  functions,  second 
generation  wavelets  can  no  longer  use  the  Fourier  transform  as  its  construction  tool.  A 
question  then  arises:  How  may  the  second  generation  wavelets  be  constructed?  The  lifting 
scheme  introduced  by  Wim  Sweldens  [29]  [30]  provides  a  profound  answer.  The  scheme 


50 

was  initially  introduced  as  a  tool  in  constructing  biorthogonal  wavelets.  Daubechies  and 
Sweldens  [11]  later  generalized  the  lifting  scheme  for  the  construction  of  any  wavelet  by 
factoring  wavelet  transforms  through  a  finite  number  of  lifting  steps.  In  the  next  section 
we  first  study  the  properties  of  the  biorthogonal  wavelets  and  duals  which  is  the  first 
footstone  in  developing  the  lifting  scheme. 

2  Riorthogonal  Wavelets  and  Duals 

The  orthogonality  property  in  wavelets  imposes  a  strong  limitation  in  the 
construction  of  wavelets.  The  Haar  wavelet  is  the  only  real-valued  wavelet  which  is 
compactly  supported,  orthogonal  and  symmetric  at  the  same  time  [9].  The  scaling  function 
(p e  L2 ,  and  the  wavelet  function  \|/eL2  both  contribute  to  a  multiscale  analysis  and 
must  satisfy  the  refinement  relations  in  the  sense  that, 

q>(*)  =  2^hkq>(2x-k)  , 

k  (6.1) 

k 

The  set  of  translated  scale  functions  { (p(jc  -  k)  \k  e  Z}  form  a  Riesz  basis.  The  refinement 
relations  are  also  applicable  to  their  duals  given  as  (p  and  \j>,  with  coefficients  hk  and  gk 
respectively,  such  that 

=  2^hkq(2x-k)  , 

k 

vji(x)  =  2j^~gkq(2x-k). 

k 

These  duals  also  fit  into  a  framework  of  multiresolution  analysis.  The  duals  are 
biorthogonal  to  their  primals  in  the  sense  that, 
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(q>(*).V(*-*)>  =  (vW.<PU-fc)>  =  0  . 
<9(*).<P(*-*)>  =  =  8* 

The  biorthogonality  condition  (6.2)  is  equivalent  to 


(6.2) 


Y<p{©  +  2Jbl)$(©  +  2jbl)  =  J]\jf((0  +  2itrc)^(ffl  +  2fat)  =  1, 


(6.3) 


Yy(C()  +  2Jbc)0<e)+2iklt)  =  £(p(©  +  2fc7t)ty(©  +  2*7t)  =  0. 

it  * 


Combining  the  conditions  (6.2),  (6.3)  and  the  refinement  relation  (6.1),  we  can  derive  a 
biorthogonal  condition  for  FIR  filters,  h,g,h,  and  g ,  as 


h((£>)h(G>)  +  /l((0  +  7t)/l(CO  +  7l)  =  1 

1 
0 

0, 


£(©)$(©)  +  g(CD  +  7t)g(©  +  71)  = 
g(C0)/l((0) +  g(C0  +  7l)/l(C0  +  Jl)  = 


/i(co)g(co)  +  h(d)  +  n)g((o  +  n) 
or  equivalently  in  matrix  notation, 


h((£>)  h((a  +  n) 
£(©)  g((0  +  %) 


h((0)  g((£>) 


1  0 

I 

0  1 

_h(G)  +  n)  g(©  +  Jt)_ 
Hence,  if  we  denote  modulation  matrices,  m(co)  and  m(a>) ,  respectively  by 


m(co)  = 


/i(co)  /i(co  +  71) 

£(©)  g(©  +  7t)_ 


and  m(co)  = 


/l((0)  /z(co  +  7t) 

jf(©)  g(©  +  7C)_ 


then,  the  biorthogonality  condition  (6.5)  becomes 


(6.4) 


(6.5) 


Vco  e  R,    m(co)m'(co)  =  I 


(6.6) 
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g 


Figure  6-1.  A  two  channel  subband  coding  scheme. 


By  transposing  equation  (6.5),  we  see  that 


h((0)h(Gi)  +  g((0)g(&)  =  1  (6?) 


/i(G))/!(co  +  rc)  +  g((o)£(a>  +  7i)  =  0. 
Equation  (6.7)  can  also  be  driven  by  a  general  two  channel  subband  coding  scheme, 
shown  in  Figure  6-1.  A  discrete  input  signal  s(n)  is  decomposed  into  an  approximation 
signal  a(n)  and  a  detail  signal  d(n).  They  are  defined  by 


a(2co)  =  |(A(co)5(o))  +  /i(to  +  7c)5(co  +  7c)) , 

d(2©)  =  r(g(C0)5(C0)  +  i(©  +  lCMC9  +  H))  . 


(6.8) 


The  reconstructed  signal  in  the  frequency  domain  can  be  written  by 

r(co)  =  a(co)s(co)  + p(co)^(co  +  K)  ,  (6.9) 

where 


a(co)  =  h(a)h((i))  +  g((£>)g((£>)  , 


(6.10) 


(3(C0)  =  /l((0)/l(C0  +  7t)  +  g(C0)g(C0  +  7t)  . 

In  order  to  achieve  perfect  reconstruction  in  subband  coding  system,  we  keep  only  the 
linear  shift-invariant  (LSI)  system  response  appearing  in  the  first  term  of  equation  (6.9) 
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and  cancel  out  the  system  aliasing  in  the  second  term  of  equation  (6.9).  From  a((o)  =  1 
and  |3(cg)  =  0 ,  we  can  derive  (6.7). 

By  applying  Cramer's  rule  to  equation  (6.7),  we  observe  that 


Mflfl.tfi  +  ft    and  g(CD)  =  (6.11) 
Dm(co)  £>m(co) 

where  D  (co)  =  det  m(co)  .  In  the  case  of  finite  filters,  we  need  to  restrict  that  Dm(co)  to 

be  a  monomial,  and  we  choose 

Dm(co)  =  -e~iw.  (6.12) 
By  applying  equation  (6.12)  to  equations  (6.11)  and  (6.7),  we  have 


g((0)  =  -e-iwh((0  +  n)  and  g((0)  =  -<rf<D/i((D  +  It) ,  (6.13) 
and  equation  (6.7)  becomes 


h((0)h((0)  +  h((0  +  ic)h((0  +  n)  =  1.  (6.14) 
The  filters  satisfying  (6.14)  are  called  conjugate  filters. 

After  finding  the  conjugate  filters,  we  can  construct  a  scaling  function.  By  the 
refinement  relation  (6.1),  the  Fourier  transform  of  the  scaling  function  is  then  defined  by 

where  //  is  the  Fourier  transform  of  the  finite  filter  h,  in  other  words 

00 

H((a)  =    £  h(n)e-in(i). 

n  =  -°° 

Since  cp(0)  =  1 ,  infinite  iterations  of  equation  (6.15)  with  a  conjugate  filter  H  satisfying 
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condition  (6.14)  yields  the  limiting  scaling  function 

oo 

<p(co)  =  Y[H(2-j(Q).  (6.16) 

7=1 

A  similar  statement  holds  for  the  dual  scaling  function  (j>  and  its  associated  dual 
filter  H.  Since  the  properties  of  the  filter  H  determine  the  properties  of  the  scaling 
function  such  as  the  smoothness  and  its  asymtotic  decay  at  infinity,  we  do  not  need  the 
scaling  function  itself  in  many  applications,  rather  we  directly  work  with  the  finite  filter 
H. 

Figure  6-2  shows  the  limiting  process  to  make  the  Daubechies's  four  tab  scaling 


1  6  11 


Figure  6-2.  Graph  showing  the  limiting  process  to  make  the  Daubechies  tap  4 
scaling  function.  From  top  to  bottom,  scaling  functions  without  iteration,  after 
one  iteration,  after  two  iterations,  and  after  fifteen  iterations,  respectively. 
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0.2 
0.1 


Figure  6-3.  Graph  of  root  mean  square  errors  involved  in  the  limiting  steps  to 
make  the  Daubecies's  tap  4  scaling  function  shown  in  Figure  6-2. 

function.  From  top  to  bottom  in  Figure  6-2,  they  are  scaling  functions  without  iteration, 
after  one  iteration,  after  two  iterations  and  after  fifteen  iterations,  respectively.  Figure  6-3 
displays  a  graph  of  root  mean  square  errors  involved  in  the  limiting  steps  to  make  the 
Daubecies's  four  tap  scaling  function.  The  limiting  process  makes  little  difference  after 
fifteen  iterations  (RMS  <  2.24e~16). 

As  pointed  out  by  Cohen  et  al.  [7],  useful  analyzing  properties  such  as  oscillations, 
and  moments  can  be  concentrated  on  the  \j/  whereas  the  synthesis  properties  such  as 
regularity  are  determined  by  \|f .  \jf  and  vj/  can  have  very  different  regularity  properties, 
however  in  practice,  \\f  is  much  more  regular  than  \j/  [10].  In  other  words,  analyzing  dual 
wavelets  have  more  vanishing  moments  than  synthesizing  wavelets. 

6.3  Lifting  Scheme 

We  recall  the  Vetterli-Herley  lemma  [32]  which  defines  a  complete 
characterization  of  filters,  h  and  h°ld ,  biorthogonal  to  a  given  filter,  h.  The  two  dual 
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filters  h  and  h°ld  are  biorthogonal  to  h  in  the  sense  of  (6.14),  and  they  are  related  by 


fc(co)  =  h°ld  +  e~i,0h((0  +  n)s(2Gi)  ,  (6.17) 
where  s((o)  is  a  trigonometric  polynomial.  In  other  words,  if  one  of  the  dual  filters  is 
biorthogonal  to  h,  and  they  are  related  through  (6. 17),  the  other  dual  is  biorthogonal  to  h  as 
well  [29].  By  combining  the  biorthogonal  condition  for  basis  functions  (6.3)  and  (6.17), 
we  can  show  the  following  relation  for  the  associated  FIR  filters.  From  an  initial  set  of 
finite  biorthogonal  filters  {h,  h°ld,  g°ld,  g}  ,  a  new  set  of  finite  biorthogonal  filters  can  be 
found  by 


■old. 


fc((D)  =  h    (co)  +  g(co)s(2a)) 
g(o>)  =  g"U((0)-h(fi»i(2»). 

This  is  called  lifting  and  this  operation  leads  to  the  (primal)  lifting  scheme.  The  primal  and 

dual  basis  functions  with  lifting  are  then  given  by: 

,  .  old,  x 

(p(x)  =  (p  (x) 

<p(*)  =  2^h°kld  <p(2x-k)  +  ^S-Mx-k) 

old,  -    -      ,     .*  (6-19) 


\\f(x)  =  (x)-^sk<p(x-k) 

k 

k 

There  are  several  facts  notable  in  (6.19).  The  scaling  function  does  not  change 
after  lifting.  The  wavelet  after  lifting  uses  the  scaling  function  in  the  same  level.  The 
wavelet  construction  formula  tells  us  that  the  property  of  the  wavelet  after  lifting  is  mostly 
determined  by  the  property  of  the  trigonometric  polynomial  s.  The  dual  wavelet  also 
changes.  However  since  the  changes  in  the  dual  wavelets  are  from  the  changes  in  the  dual 
scaling  functions,  the  changes  of  the  dual  wavelet  are  not  meaningful  by  themselves.  To 


57 


summarize,  the  lifting  algorithm  contributes  to  some  fundamental  and  meaningful  changes 
in  the  wavelet  and  the  dual  scaling  functions  through  s  which  defines  the  true  power  of  the 
lifting  scheme.  In  other  words,  a  multiresolution  analysis  can  be  achieved  by  starting  from 
a  simple  basis  function  and  building  a  more  performant  one  with  desirable  properties. 

We  can  also  define  a  dual  lifting  from  an  initial  set  of  filters  {h°ld,  h,  g,  g°ld}  and 
a  dual  lifting  polynomial  s  such  that 


/i((0)  =  h°ld(a)  +  g((0)~s(2(0) 


g(to)  =  ~gold(G))-hm~S(2(is). 
The  dual  lifting  scheme  is  thus  defined  by  simply  toggling  the  tildes  in  (6.19)  as 

.        -old,  > 
(p(x)  =  (p  (x) 

q>(jc)  =  2£*JH  <p(2*-*)  +  25-*V(*-*) 

-/  *  .  v-  ~<    I  (6-21) 

k 

=  2^g°kld(?(2x-k). 

k 

By  alternating  the  primal  and  dual  lifting  scheme,  one  can  come  up  with  a 
multiresolution  analysis  geared  for  specific  properties.  This  so-called  cakewalk  algorithm 
leads  to  a  canonical  form  of  the  wavelet  transform  shown  in  Figure  6-4.  Here  the  filters 
followed  by  successive  subsampling  define  the  Lazy  wavelet  transform  which  simply 
splits  an  original  signal  into  odd  and  even  subsets.  The  dual  lifting  predicts  wavelet 
coefficients  with  the  help  of  the  scaling  function  coefficients  based  upon  the  correlation 
present  in  the  finer  signal.  The  primal  lifting  then  updates  the  scaling  function  coefficients 
using  the  previously  predicted  wavelet  coefficients  to  remove  possible  aliasing  resulting 
from  downsampling. 
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Figure  6-4.  A  1-D  discrete  wavelet  transform  with  lifting  polynomials. 

Based  upon  the  formulae  of  the  primal  and  dual  lifting  schemes,  a  fast  forward  and 
inverse  wavelet  transform  can  thus  be  formulated  as 


FORWARD  TRANSFORM 

ft 


(I)    Xjt  =  j2^hk_2iXj+hk    and  y,,/ 

N-  1 
k 

N-l 

(ffl)       =  XJtl+  5>/_*Y,-iifc 


k 


TNVFRSF  TRANSFORM 

N-l 

(D  hi  =  5>'-*y,,* 

it 

/V- 1 

(III)  =  ^X^_A/  +  ^Xrf-2/YiJ 
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Here  N  and  N  are  the  orders  of  the  primal  and  dual  lifting  polynomials  defined  as  s  and  s 
respectively. 

Now  we  can  define  a  canonical  implementation  structure  of  the  lifting  scheme 
with  three  stages:  split,  predict  and  update,  idea  of  which  is  well  described  in  the  cakewalk 
algorithm  shown  in  Figure  6-4.  A  simpler  version  of  the  forward  wavelet  transform  with 
lifting  is  shown  in  Figure  6-5.  The  split  process  splits  the  input  data  {Xj  +  {}  into  two 
smaller  subsets  {  Xj}  and  {jj} .  From  the  perfect  reconstruction  property  in  the  polyphase 
representation,  the  most  trivial  subsets  are  even  and  odd  samples.  This  is  often  referred  to 
as  the  Lazy  wavelet  transform  [29]  as  described  earlier.  In  the  second  stage  Predict,  we  try 
to  use  the  {Xj}  subset  to  predict  the  {y;}  subset  based  on  the  correlation  present  in  the 
original  data.  The  {Xj}  subset  is  then  lifted  with  the  help  of  the  wavelet  coefficients  {jj}  . 
The  idea  is  to  find  a  better  {Xj}  so  that  a  certain  scalar  quantity  Qfj,  such  as  the  mean  is 
preserved,  QiXj+])  =  (&Xj)  .  These  aspects  lead  to  the  following  forward  wavelet 
transform: 


For/'  =  -  1  downto-n:  « 


Spdt(Xj+l) 


9_ 


X 


Split 


Predict 


Update 


Figure  6-5.  Canonical  structure  in  implementing  1-D  discrete  forward  wavelet 
transform  with  lifting. 
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where  !Pand  U  are  prediction  and  update  operators,  respectively.  The  inverse  transform  is 
immediately  driven  by  reversing  the  operations  and  toggling  the  signs: 


For/'  =  -  n  upto  -  1 : 


y.  =  y.  +  <BQ,j) 


Figure  6-6  shows  the  wavelet  representation  of  a  ROI  in  a  mammogram  with 
lifting  after  level  3  decomposition.  The  wavelet  transform  presented  here  in  fact  is  the 
(N  =  2,  N  =  2)  biorthogonal  wavelet  transform  of  Cohen-Daubechies-Feauveau  [7]. 

n  A  Interpol  atinp  Scaling  Functions  Associated  Wavelets  and  Lifting  Scheme 

Sweldens  [29]  [30]  [31]  showed  that  the  lifting  scheme  described  in  the  previous 
section  is  connected  with  interpolating  scaling  functions.  The  use  of  interpolating  scaling 
functions  in  multiresolution  analysis  has  been  studied  by  several  authors  [12]  [13].  The 
main  advantage  of  the  interpolating  scaling  function  is  that  the  coefficients  Xk  in  a 
function  expansion  f{x)  =  JX^x-Jk)  can  be  easily  obtained  by  lk  =  f(k)  since  the 

k 

interpolating  scaling  function  is  defined  as  (p(fc)  =  8^  for  all  k  e  Z .  By  combining  the 
definition  of  the  interpolating  scaling  function  and  the  refinement  relation  (6.1),  we  get 
hlk  -  y  which  becomes 

/i(CQ)  +  fc((0  +  7t)  =  1-  (6-22) 
Filters  satisfying  (6.22)  is  called  interpolating  filters,  a -trous  filters  are  the  interpolating 
filters  which  are  used  to  compute  coefficients  of  a  continuous  wavelet  transform  [8]  [24]. 

There  have  been  several  approaches  to  compute  interpolating  scaling  functions. 
Here  we  briefly  discuss  one  proposed  by  Deslauriers  and  Dubuc  [12]:  interpolating 
subdivision,  which  is  connected  to  our  Overcomplete  Lifting  Scheme.  The  interpolating 
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Figure  6-6.  Wavelet  representations  of  a  ROI  in  a  mammogram  computed  via  lifting 
for  three  levels  of  decomposition. 
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subdivision  is  formally  defined  as  follows.  We  first  construct  a  polynomial  P  of  degree  N- 
1  so  that  P(xj  k  +  l)  =  \k+l  for  -  D+  1  </<£),  where  N  =  2D.  We  then  calculate 
coefficients  on  the  next  finer  level  as  the  value  of  this  polynomial  at 
JL ,  2k+{  =  P(xj+l  2k+0-  Figure  6"7  snows  tne  scaling  functions  with  interpolating 
subdivision  of  order  2, 4,  6,  and  8  from  top  to  bottom. 

These  interpolating  scaling  functions  have  compact  supports,  and  are  symmetric, 
so  that  a  cp(jc)  is  zero  outside  the  interval  [-N+  l,N-  1] .  The  subdivision  interpolating 
scaling  functions  and  their  translates  reproduce  polynomials  up  to  degree  N-  1 .  They 


Figure  6-7.  Interpolating  scaling  functions  resulting  from  interpolating 
subdivisions  of  order  2, 4,  6  and  8  from  top  to  bottom 
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also  satisfy  the  refinement  equation  (6.1),  which  is  critical  in  defining  a  multiresolution 
analysis.  Since  the  order  of  the  reproduced  polynomials  is  invariant  within  an  interval,  the 
resulting  scaling  functions  are  well  adapted  on  an  interval  of  finite  signal  without 
extensions  such  as  zero  padding,  periodization  or  reflection.  Figure  6-8  shows  an  example 
of  cubic  (N  =  4)  scaling  functions  changing  their  shapes  to  accommodate  themselves  near 
the  left  boundary  without  degrading  the  regularity.  The  interpolation  properties  previously 
mentioned  are  still  preserved  well  on  the  boundary  for  these  scaling  functions. 

Table  6-1  lists  the  lifting  coefficients  needed  for  the  interpolating  subdivision 


Figure  6-8.  The  cubic  (N  =  4)  interpolating  scaling  functions  near  boundary  (top  to 
bottom)  changing  their  shapes  to  alleviate  boundary  effects. 
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#  of  X's  on 
(left,  right) 

Lifting  Coefficients 
(N  -  2) 

(0,  2) 

(-0.5,  1,5) 

(l.D 

(0.5, 0.5) 

(2,  0) 

(1.5,-0.5) 

(a) 


#  of  X's  on 
(left,  right) 

Lifting  Coefficients 
iN  =  4) 

(0,  4) 

(2.1875,-2.1875,  1.3125,-0.3125) 

(1.3) 

(0.3125,  0.9375,  -0.3125,  0.0625) 

(2,2) 

(-0.0625,  0.5625,  0.5625,  -0.0625) 

(3,  1) 

(0.0625,  -0.3125,  0.9375,  0.3125) 

(4,0) 

(-0.3125,  1.3125,-2.1875,2.1875) 

(b) 


#  of  X's  on 
(left,  right) 

Lifting  Coefficients 
(N  =  6) 

(0,  6) 

(2.7070,  -4.5117,  5.4141,  -3.8672,  1.5039,  -0.2461) 

(1,5) 

(0.2461,  1.2305,  -0.8203,  0.4922,  -0.1758, 0.0273) 

(2,  4) 

(-0.0273,  0.4102,  0.8203,  -0.2734,  0.0820,  -0.01 17) 

(3,3) 

(0.01 17,  -0.0977,  0.5859,  0.5859,  -0.0977,  0.01 17) 

(4,  2) 

(-0.01 17,  0.0820,  -0.2734,  0.8203,  0.4102,  -0.0273) 

(5,  1) 

(0.0273,  -0.1758,  0.4922,  -0.8203,  1.2305, 0.2461) 

(6,  0) 

(-0.2461,  1.5039,  -3.8672,  5.4141,  -4.5117,  2.7070) 

(O 


#  of  X's  on 
(left,  right) 

Lifting  Coefficients 
(JV  =  5) 

(0,  8) 

(3.1421,  -7.3315,  13.1968,  -15.7104,  12.2192,  -5.9985,  1.6919,  -0.2095) 

(1,7) 

(0.2095,  1.4663,  -1.4663,  1.4663,  -1.0474,  0.4888,  -0.1333,  0.0161) 

(2,  6) 

(-0.0161,  0.3384,  1.0151,  -0.5640, 0.3384,  -0.1450,  0.0376,  -0.0044) 

(3,5) 

(0.0044,  -0.0513,  0.4614,  0.7690,  -0.2563,  0.0923,  -0.0220,  0.0024) 

(4,4) 

(-0.0024,  0.0239,  -0.1 196,  0.5981, 0.5981,  -0.1 196, 0.0239,  -0.0024) 

(5,3) 

(0.0024,  -0.0220,  0.0923,  -0.2563,  0.7690,  0.4614,  -0.0513,  0.0044) 

(6,  2) 

(-0.0044,  0.0376,  -0.1450,  0.3384,  -0.5640,  1.0151,  0.3384,  -0.0161) 

(7,  1) 

(0.0161,  -0.1333,  0.4888,  -1.0474,  1.4663,  -1.4663,  1.4663,  0.2095) 

(8,0) 

(-0.2095,  1.6919,  -5.9985,  12.2192,-15.7104,  13.1968,  -7.3315,3.1421) 

(d) 


Table  6-1.  Lifting  coefficients  based  on  the  number  of  X's  on  its  left  and  right,  (a),  (b), 
(c)  and  (d)  show  the  filter  coefficients  for  N  =  2,  N  =  4  N  =  6  and  N  =  8,  respectively 
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scheme  which  are  computed  from  Neville's  algorithm.  Table  6- 1(a),  (b),  (c)  and  (d)  show 
the  filter  coefficients  for  N  =  2,  N  =  4  N  =  6  and  N  =  8,  respectively.  Each  lists  all 
coefficients  to  compute  values  at  every  possible  location  newly  introduced  as  a  middle 
point  at  the  next  finer  level. 

By  filling  in  the  refinement  relation  in  the  interpolating  case,  we  see  that 

which  can  be  written  as 

'    %  k  =  (Pj+l,2fc  +  2/l/'.*.2m+l(P/+l)2m+l  ■ 
m 

An  approximation  Ajf  from  a  finer  one  Aj  +  /  is  then  built  by  simple  subsampling, 
\.  k  =  Xj+l  2k'  followed  by  one  step  subdivision  on  Aj : 

A/=  ^j,k%,k  =  ^^+l>2*(P;+l,2Jt  +  ^Z^,*/l7.*,2m+l(Py+l,2m  +  l  ■ 

The  detail,  Aj+i-  A. ,  consists  only  of  functions  of  the  form  97  +  i,  2m  +  l  • Tnus  tne 
wavelets  are  given  by  \fj  m  =  9y+i,2m  +  i-  However,  a  necessary  condition  for  the 
wavelets  to  form  a  Riesz  basis  is  that  (dual)  vanishing  moment  N  is  at  least  one.  A  new 
wavelet  with  N  is  then  formulated  by  following  with  lifting  scheme: 

Vj,m  =  %+  l,2m+  1  ~  ^Sj,m%m  +  l  ' 

1  =  0 

where  the  lifting  coefficients  sj  m  are  chosen  so  that 

JV/,mU)  =  0.  J*¥/;m(*)  =  0,   ,  p-'Vy.mU)  =  0  . 

In  other  words,  the  new  wavelet  is  composed  of  the  old  wavelet,  a  subsampled 
scaling  function  located  at  odd  indices,  and  its  immediate  neighboring  scaling  functions  at 


66 

the  same  scale  j.  Figure  6-9  shows  wavelets  with  two  (dual)  vanishing  moments,  N  =  2 , 
associated  with  interpolating  scaling  functions  the  order  of  the  subdivision  of  which  are  2, 
4,  6,  and  8  from  top  to  bottom.  Figure  6-10  shows  the  wavelets  affected  by  the  boundary. 
The  cubic  (N  =  4)  scaling  functions  are  used  to  construct  the  boundary  wavelets  having 
N  =  2  vanishing  moments. 

The  following  shows  how  to  increase  the  number  of  vanishing  moments  of  the 
Haar  wavelet  (simplest  wavelet)  from  one  to  two  with  the  lifting  scheme.  The  initial  filters 
associated  with  the  Haar  wavelet  and  the  scaling  function  are  given  by: 


h°ld((0)  =  fc((D)  =  l/2+l/2<T,a\ 
g(co)  =  goW(a>)  =  -1/2+1/2*-"° 


Figure  6-9.  Wavelets  with  two  (dual)  vanishing  moments  (N  =  2)  associated  with 
interpolating  scaling  functions  whose  orders  of  the  subdivision  are  2,  4,  6,  and  8  from 
top  to  bottom. 
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Figure  6-10.  Wavelets  accommodating  the  left  boundary.  Here  N  =  4  and  N  =  2, 


After  lifting,  we  have 


old, 


g(co)  =  g    (co)-/i(co)5(2co)  . 
In  order  to  have  two  vanishing  moments,  g(0)  =  0  ,  and  g'(0)  =  0  .  By  solving  the 
linear  equations,  s(a>)  =  -t'/4sinco.  This  leads  to  one  of  the  biorthogonal  filters 
proposed  by  Cohen-Daubechies-Feauveau  [7]: 


-  1    2ico      1    10)     1      1  -/to      1    -2io)      1  -3/0) 

h  =  -—e     +  — e    +-  +  -e     +  —e      -  —e 
16         16        2    2         16  16 


(6.23) 


Lifting  (6.18)  allows  us  to  get  a  faster  implementation  of  the  wavelet  transform  if 


•old 


h     is  the  shortest  filter.  The  standard  wavelet  transform  for  the  low-pass  channel  using 
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the  filter  (6.23)  is 

h  I  -  '^V  1.21-2  +  ^  + 1.21-1  +  \h+  I-  2/  +  (624) 

1.  1  *  1  , 

2^+  1,2/+  1  +  16A;'+  1.2/  +  2     16A7+  1, 2/  +  3- 

Using  the  original  filters  associated  with  the  Haar  basis  function  and  lifting 
coefficients  with  two  vanishing  moments,  5,  =        =  1/8 ,  equation  (6.24)  becomes 

hl  =  l/2^+.)2/+1/2^+l,2/+l  +  1/8Y/!/-l-1/8V/,/+l'  (6-25) 

which  has  fewer  operations. 

The  complexity  of  the  lifting  scheme  in  this  case  is  equivalent  to  the  one  required 
for  the  length-4  direct  filter  bank  algorithm  instead  of  the  length-6's.  Table  4-1  in  Chapter 
4  shows  that  nOPsdi„ct\  =  (4  0  3)  =  7,  and  nOPsdirect\  =  (6  ©  5)  =  11.  The 
speedup  achieved  by  the  lifting  is  thus  the  factor  of  1.57.  More  details  on  the  complexity 
of  the  lifting  scheme  will  be  studied  later  in  this  chapter. 

f>  5  Generalized  T.iftinp  Scheme 

We  have  seen  in  the  previous  sections  that  there  are  many  advantages  of  the  lifting 
such  as  an  in-place  implementation  of  the  fast  wavelet  transform,  no  need  for  the  Fourier 
transform,  easiness  of  the  inverse  transform,  capability  of  custom-design  of  wavelets  and 
adaptiveness  of  wavelet  transforms.  However  the  lifting  scheme  discussed  was  used  in 
constructing  biorthogonal  wavelets.  However  can  the  lifting  be  used  in  constructing  any 
wavelets  or  wavelet  transforms  with  finite  filters?  Daubechies  and  Sweldens  addressed  the 
generality  of  the  lifting  by  connecting  the  Euclidean  algorithm  to  a  subband  transform  in 
the  z-domain  [11]. 

The  z-transform  of  a  FIR  filter  h  is  a  Laurent  polynomial  H(z)  given  by 
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k  =  a 

where  a  and  are  respectively  the  lower  and  upper  bound  of  the  support  of  the  filter  h. 
The  degree  of  the  Laurent  polynomial  H(z)  is  given  by  \H(z)\  =  b-a,  so  that  a 
polynomial  Cz  has  zero  degree  and  is  a  monomial,  where  C  is  a  non-zero  constant  and  / 
is  an  integer.  A  Laurent  polynomial  is  invertible  if  and  only  if  it  is  a  monomial. 

The  monomial  constraint  plays  a  key  role  in  finding  a  modulation  matrix  or  a 
polyphase  matrix  in  a  filter  bank  coding  system.  Figure  6-11  shows  a  discrete  wavelet 
transform  in  z-domain.  The  conditions  for  perfect  reconstruction  are  given  by 


H(z)H(rl)  +  G(z)G(z-1)  =  2, 
H(z)H(-z-x)  +  G(z)G(-z-1)  =  0. 

The  modulation  matrices  M(z)  and  M(z)  are  defined  as 


(6.26) 


M(z)  = 

H(z)  H(-z)1 

,  M{z)  = 

H(Z)  m-z) 

_G(z)  G(-z)_ 

G{z)  G(-z) 

(6.27) 


By  using  these  modulation  matrices,  the  perfect  reconstruction  condition  (6.26)  can  be 


Figure  6-11.  Discrete  wavelet  transform  in  z-domain. 


70 


written  as 

M(-z-!)'M(z)  =  21,  (6.28) 
where  I  is  the  2  by  2  identity  matrix. 

We  can  simplify  the  perfect  reconstruction  in  the  filter  bank  structure  using  a 
polyphase  representation  which  provides  an  efficient  and  convenient  tool.  The  polyphase 
implementation  works  on  different  phases  separately.  The  input  is  separated  into  even  and 
odd  phases  followed  by  subsampling.  Then  the  separate  phases  of  the  filters  act  in  parallel 
on  the  separate  phases  of  the  input.  Figure  6-12  shows  a  polyphase  representation  of  the 
wavelet  transform  shown  in  Figure  6-11.  The  polyphase  representation  of  a  filter  H  is 
given  as  H(z)  =  He(z2)  +  z~lH0(z2) ,  where  He  and  H0  contain  respectively  the  even 
and  odd  coefficients.  If  we  denote  the  polyphase  matrix  P(z)  as 


P(z)  = 


He(z)  Ge(z) 
H0{z)  Ga(z) 


(6.29) 


then  the  relation  between  the  modulation  matrix  (6.27)  and  the  polyphase  matrix  (6.29) 
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becomes 


1  1 

z  -z 


M'(z)  =  ^ 


M(z) 


1  z 
1  -z 


(6.30) 


Now  the  perfect  reconstruction  property  with  modulation  matrices  (6.28)  can  be  written  in 
polyphase  matrices  as 

p(z)p(z~ly  =  i.  (6-31) 

From  the  invertible  property  of  Laurent  polynomials  and  (6.31),  the  determinant  of  P(z) 
must  be  a  monomial  in  z  so  that  P(z)  =  Czl .  In  other  words,  finding  FIR  filters  for  a 
wavelet  transform  becomes  finding  a  polyphase  matrix  P(z)  with  the  determinant  one.  By 
combining  this  monomial  property  and  (6.31),  the  dual  polyphase  matrix  P(z)  is  given  as 


P(z)  = 


Ga(z-1) -H0(z-1) 
-GAz~X)  HAz-1) 


The  most  trivial  case  of  the  polyphase  matrix  satisfying  the  condition  of  perfect 
reconstruction  is  P(z)  =  I  which  implies  H(z)  =  H(z)  =  1  and  G(z)  =  G{z)  =  z_1  • 
The  transform  with  P(z)  =  I  is  often  referred  as  the  Lazy  wavelet  transform  since  it  does 
nothing  but  separating  the  input  into  even  and  odd  subsets. 

Using  the  polyphase  structure,  the  lifting  given  in  equation  (6.18)  can  be  written  in 
z-domain  as 


H{z)  =  Hold(z)  +  G(z)S(z~2) 
G(z)  =  G°ld(z)-H(z)S(z2), 


(6.32) 


and  the  dual  lifting  in  equation  (6.20)  as 


H(z)  =  Hold{z)  +  G(z)S(z-2) 
G(z)  =  Gold(z)-H(z)~S(z2)  . 


(6.33) 
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There  are  two  noble  identities  for  a  decimator  and  interpolator.  In  most 
applications  a  decimator  is  preceded  by  a  bandlimiting  filter  H{z)  to  avoid  aliasing.  An 
interpolation  filter,  on  the  other  hand,  is  followed  by  an  interpolator  to  eliminate  imaging 
effect.  Figure  6-13  shows  equivalent  identity  systems.  In  Figure  6- 13(a),  we  have  a 
decimator  followed  by  a  transfer  function  F(z) .  This  cascade  is  equivalent  to  the  one  in 
the  right  provided  F(z)  is  a  rational  transfer  function,  i.e.  a  ratio  of  polynomials  in  z-1 .  In 
Figure  6- 13(b),  we  show  that  the  two  cascades  in  the  left  and  right  are  equivalent  for  a 
interpolator.  Based  on  the  property  of  the  noble  identities  with  M  =  2  and  L  =  2 ,  the 
polyphase  components  of  H(z)S(z2)  are  He(z)S(z)  for  even  and  Ha(z)S(z)  for  odd.  The 
new  polyphase  matrix  after  primal  lifting  is  thus  given  by 


P(z)  =  P  (z) 


1  S(z) 
0  1 


(6.34) 


and  the  dual  polyphase  matrix  by 


-old 

P(z)  =  P  (z) 


1  0 

S(z~l)  1 


(6.35) 


F(z) 

F{zM) 

\M 

(a) 

-Ml 

t\zL) 

(b) 


Figure  6-13.  Noble  identities  for  multirate  systems.  Identity  systems  for  (a)  a 
decimator,  and  (b)  a  interpolator. 
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Similarly,  the  new  polyphase  matrix  after  dual  lifting  is  given  as 


P(z)  =  P°l\z) 


1  0 

S(z)  I 


(6.36) 


and  the  dual  polyphase  matrix  as 


-old 

Piz)  =  P  (z) 


1  S(z-1) 
0  1 


(6.37) 


The  Euclidean  algorithm  developed  to  find  the  greatest  common  devisor  (GCD)  of 
two  natural  numbers  can  be  extended  to  find  a  GCD  of  two  Laurent  polynomials  [11].  In 
other  words,  two  Laurent  polynomials  A(z)  and  B(z)  are  presented  as 


A(z) 
B(z) 


=  n 

i=  1 


Qi(z)  i 

1  0 


An{z) 
0 


(6.38) 


where  An(z)  is  the  GCD,  n  is  the  smallest  number  for  which  Bn(z)  =  0,  and 
Q.{z)  =  A^^/B^^z).  Here  if  An(z)  is  a  monomial,  then  A(z)  and  B(z)  are 
relatively  prime.  We  can  always  choose  the  quotients  Q^z)  so  that  the  GCD  An(z)  is  a 
constant. 

By  applying  the  Euclidean  algorithm  to  polyphase  matrices,  we  can  factor  any  pair 
of  complementary  filters  (H(z),G(z))  into  lifting  steps.  A  filter  pair  (H(z),G(z))  is 
defined  to  be  complementary  if  the  determinant  of  the  corresponding  polyphase  matrix 
P(z)  is  one.  The  polyphase  matrix  can  be  given  as 


P(z)  = 


n 

i  =  1 


1  -SM) 
0  1 


1  0 

Stiz)  1 


K  0 

4 


(6.39) 


74 


and  the  dual  polyphase  matrix  as 


P(z)  = 


(  m 

n 


1  0 

^(z-1)  1 


1  -^(z-1) 
0  1 


0  K 


(6.40) 


Here  K  is  a  non-zero  constant,  and  m  =  ^  +  1 .  The  polyphase  matrix  for  analysis  filter 


bank  P(z~1)'  is  thus  written  as 


P(z-lY  = 


1 


0 


K 
0  K 


f  l 

n 

Vi  =  m 


1  0 

-Stiz)  1 


1  5,(z) 
0  1 


(6.41) 


Figure  6-14  shows  the  forward  wavelet  transform  using  the  dual  polyphase  matrix 
with  finite  lifting  steps  given  in  (6.41).  Figure  6-15  represents  the  finite  steps  of  the 
inverse  wavelet  transform  using  the  polyphase  matrix  representation  in  (6.39). 

Every  wavelet  or  wavelet  transform  can  thus  be  obtained  with  a  factorization  of 
the  polyphase  matrix  representation  of  the  filter  bank  by  means  of  Euclidean  algorithm. 


Hz-XY 


Figure  6-14.  Forward  wavelet  transform  using  a  dual  polyphase  matrix  P(z_1)fwith 
finite  lifting  steps. 
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P(z) 

Figure  6-15.  Inverse  wavelet  transform  using  the  polyphase  matrix  P(z)  with  finite 
lifting  steps. 

6     Computational  cost  of  the  T.iftinp  Scheme 

To  compute  the  computational  cost  of  the  lifting  scheme,  we  use  the  generic 
scheme  presented  in  Figure  6-14.  We  need  to  find  one  or  more  lifting  filters  s  and  s,  and 
normalization  factors  K  and  l/K,  such  that  this  scheme  is  equivalent  to  the  one  in  Figure 
6-1. 

For  the  purpose  of  the  cost  comparison,  we  can  rewrite  the  DWT  algorithm  using 
the  lifting  scheme  with  the  following  implementation  steps.  Since  the  inverse  transform 
always  follows  immediately  from  the  forward  transform,  we  only  focus  on  the  forward 
transform. 

1.  Split 

M0)<-*2i 

2.  Lifting  steps 

for  /  =  1  to  L 
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Primal  lifting:  Ml)  <r-Xil~^  +  Jjjrf-  1 ) 
Dual  lifting:    ^  <-  yil  ~ 1  >  -  " 1 5 

end  for 
3.  Normalization 

Let  L  be  the  length  of  each  of  the  two  filters  h[n]  and  g[n]  used  in  the  direct 
filter  bank  (standard)  algorithm.  As  described  in  Chapter  4,  the  assumption  of  the  same 
length  of  two  filters  does  not  lose  generality  since  we  can  pad  shorter  filters  with  zeros  to 
have  the  same  length  as  the  longest  filter.  Daubechies  and  Sweldens  [11]  showed  that  the 
computational  cost  of  the  lifting  scheme  with  two  length-L  filters  is  2L  +  2 .  Thus  the  cost 
of  the  lifting  scheme  per  input  point  in  the  computational  cell  nOPsHjting  is  given  as 

nOPslifting  =1+1.  (6.42) 

Since  nOPsdirect  =  2L  -  1 ,  the  lifting  scheme  can  achieve  an  overall  speedup  of 
2  over  the  direct  filter  bank  (standard)  algorithm. 

Figure  6-16  shows  the  comparison  of  the  costs  between  the  various  DWT's 
described  in  Chapter  4,  such  as  the  direct  filter  bank  algorithm,  the  FFT-based  algorithm 
and  the  short-length  FIR  algorithm,  and  the  complexity  of  the  lifting  scheme.  Figure  6-16 
tells  us  that  the  lifting  scheme  is  more  efficient  than  the  FFT-based  DWT  algorithm  for 
lengths  up  to  L  =  34 . 
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Figure  6-16.  Comparison  of  the  costs  between  various  DWT  algorithms 
described  in  Chapter  4  and  the  lifting  scheme. 


CHAPTER  7 
OVERCOMPLETE  LIFTING  SCHEME 

In  this  chapter  we  propose  an  overcomplete  lifting  scheme  which  provides  a 
framework  for  multiscale  analysis  with  overcomplete  wavelet  representations.  As 
described  in  the  previous  chapter,  the  lifting  scheme  is  a  flexible  tool  for  constructing 
compactly  supported  second  generation  wavelets  which  are  not  necessarily  translates  and 
dilates  of  one  mother  wavelet  [29].  We  have  seen  that  the  lifting  scheme  uses  a  simple 
basis  function  and  builds  a  better  performing  one  with  desirable  properties.  Flexibility 
afforded  by  the  lifting  scheme  allows  basis  functions  to  change  their  shapes  near  the 
boundaries  without  degrading  regularities.  The  lifting  scheme  also  provides  faster 
processing  by  making  optimal  use  of  similarities  between  high  and  low  pass  filters.  The 
lifting  algorithm  was  originally  introduced  to  construct  biorthogonal  wavelets  associated 
with  interpolating  scaling  functions.  We  showed  in  the  previous  chapter  that  the  lifting 
scheme  can  be  generalized  with  a  finite  number  of  lifting  steps  to  construct  any  wavelets. 

The  lifting  scheme  utilizes  a  classical  two  channel  filter  bank  as  a  framework  for 
multiresolution  analysis.  However  this  traditional  framework  is  not  translation  invariant 
[20].  Representations  with  a  translation-invariant  characteristic  are  highly  desirable  for 
feature  analysis.  Mallat  and  Zhong  introduced  an  adaptive  sampling  based  upon  local 
maxima  as  an  overcomplete  wavelet  representation  [20]. 

In  this  chapter  we  address  the  following  question:  Can  the  lifting  scheme  be  used 
as  a  framework  for  overcomplete  wavelet  representations  with  feature  analysis  in  mind? 
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Our  work  addresses  this  question  by  investigating  each  stage  of  the  multiscale  analysis 
described  by  Sweldens  [29].  We  use  a  smoothing  Lazy  wavelet  in  the  split  stage  which 
does  not  subsample,  but  smooths  an  input  image  so  that  the  low-pass  channel  contains 
some  redundant  information.  We  then  show  that  the  following  predict  and  update  stages, 
based  upon  specific  properties,  indeed  lead  to  a  useful  multiresolution  analysis. 

7  1  TTnHe.dmate.H  Discrete.  Wavelet  Transform  with  Lifting 

In  the  previous  chapter  we  showed  how  the  (primal)  lifting  (6.19)  and  the  dual 
lifting  (6.21)  were  driven  and  how  they  can  be  embedded  in  a  framework  for  a  multiscale 
analysis  which  was  implemented  in  the  filter  bank  coding  system  shown  in  Figure  6-4. 
This  computational  structure  was  simplified  later  as  a  semantic  form  in  Figure  6-5.  Here 
the  filters  followed  by  successive  subsampling  was  defined  as  the  Lazy  wavelet  transform 
which  simply  splits  an  original  signal  into  odd  and  even  subsets.  The  dual  lifting  predicted 
wavelet  coefficients  with  the  help  of  the  scaling  function  coefficients  based  upon  the 
correlation  present  in  the  finer  signal.  The  primal  lifting  then  updated  the  scaling  function 
coefficients  using  the  previously  predicted  wavelet  coefficients  to  remove  possible 
aliasing  resulted  from  the  downsampling. 

For  overcomplete  representations,  we  introduce  a  smoothing  lazy  wavelet  in  the 
split  stage  which  does  not  downsample  the  input  image  so  that  ^+1,*  =  ^/j  =  Yj,m- 
Here  we  choose  the  indices  such  that  y  e  J,  I  e  K(j) ,  k  e  K(J  +  1 ) ,  and  m  e  M(j)  where 
M(J)  =  K(J  +  1  )\  K(j) .  We  then  smooth  X,  k  using  a  smoothing  filter,  which  is  often  a 
Gaussian  [20].  This  new  splitting  methodology  in  the  filter  bank  scheme  is  then  followed 
by  the  decorrelation  of  wavelet  coefficients  using  the  dual  lifting.  Since  the  new  split 
operation  does  not  use  downsampling  to  make  distinct  subsets  {XL  t}  and  {y,  /} ,  there  is 
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no  aliasing.  The  primal  lifting  introduced  to  compensate  for  the  aliasing  by  preserving 
energy  between  two  contiguous  approximations  needs  no  longer  exist.  The  new  forward 
and  inverse  wavelet  transform  algorithm  for  overcomplete  representations  can  thus  be 
formulated  as 

FORWARD  TRANSFORM 

Split:*,;,  =  J2%hk-2frj+i,k  >  Y/,m  =  h+i.k 

k 

Dual  Lifting:  yjm  =  Yy>m-  j£  ^,m-2« 

n  =  0 

TNVFRSR  TRANSFORM 

N-\ 

Inverse  Dual  Lifting:  yjm  =  yjm  +  £  sn\m.2n 

n  =  0 

Join:  Xj+hk  =  \i  +  \m 

where  K(j+  1),  L(j)  and  M(j)  are  index  sets  such  that  1^0'+  1)1  =  =  l^0')l . 

{k\ke  K(j  +  1)},  {/|/e  L(j)}  andm|me  M(j).  N  and  are  primal  and  dual  vanishing 
moments,  respectively.  Figure  7-1  shows  an  undecimated  version  of  two  channel  analysis/ 
synthesis  filter  bank  with  the  newly  proposed  overcomplete  lifting  scheme.  By  iterating 
this  through  the  low -pass  channel,  we  can  compute  the  1-D  discrete  wavelet  transform  for 
overcomplete  multiscale  representations. 

In  the  previous  chapter  we  discussed  how  interpolating  scaling  functions  are 
related  to  the  lifting  scheme  through  the  interpolating  subdivision  proposed  by  Deslauriers 
and  Dubuc  [12].  Here  we  use  the  same  mechanism,  but  the  approach  is  different.  Instead 
of  refining  samples  by  filling  middle  point  between  two  contiguous  samples  at  a  finer 


(7.1) 


(7.2) 
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Figure  7-1.  Undecimated  version  of  two  channel  filter  bank  with  the  proposed 
over  complete  lifting. 


3 

{X'j j}  where  A,'  =  J  sn\ 

n  =  0 


Figure  7-2.  Forward  wavelet  transform  to  extract  detail  information  as  failures  to  be 
cubic. 
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level  using  subdivision  with  a  constant  spatial  support,  we  keep  the  size  of  the  samples 
constant  throughout  the  decomposition,  but  change  the  support  of  the  filter  associated  with 
the  interpolating  scaling  function. 

Figure  7-2  illustrates  how  to  compute  the  wavelet  coefficients  with  a  cubic  dual 
lifting  polynomial  (N  =  4).  In  this  figure,  we  show  how  the  samples  at  the  next  analyzing 
level  are  related  to  the  samples  at  the  finer  level  using  two  points  one  of  which  is  near  the 
left  boundary,  and  other  not.  The  predicted  wavelet  coefficients  contain  the  amount  by 
which  the  coarser  approximation  locally  fails  to  be  cubic.  Here  the  dual  lifting  coefficients 
s  =  {-0.0625,  0.5625,  0.5625,  -0.0625}.  The  coefficients  are  adjusted  near  the  boundaries 
of  the  finite  input  signal  to  avoid  boundary  effects.  For  N  =  6,  s  =  {0.0117,  -0.0977, 
0.5859,  0.5859,  -0.0977,  0.0117}  and  s  =  {-0.0024,  0.0239,  -0.1196,  0.5981,  0.5981,  - 
0.1196,  0.0239,  -0.0024}  for/V=  8. 

To  improve  the  performance  of  the  overcomplete  lifting  scheme,  we  use  the 
multiscale  block  algorithm  explained  in  Chapter  5.  In  the  block  algorithm  we  keep  the  size 
of  the  filter  kernels  unchanged  throughout  multiscale  analysis.  Instead,  we  partition  the 
input  sequence  into  two  sub-sequences  so  that  each  subsampled  sequence  contains  only 
those  pixels  affected  by  the  filter  kernel.  Figure  7-3  demonstrates  the  overcomplete  lifting 
scheme  with  block  algorithm  applied  to  a  sine  signal  with  discontinuities  near  the  middle. 
Figure  7-3  (d)  tells  us  that  there  are  no  boundary  effects  which  were  obviously  seen  in 
Figure  2- 1(b). 

For  images,  2-D  masks  can  be  used.  Figure  7-4(a)  is  a  mask  for  N  =  4  which  is 
used  for  regions  which  are  not  affected  by  the  boundaries.  Figure  7-4(b)  is  for  the  pixel  (0, 
0)  of  an  image. 
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Figure  7-3.  Implementation  of  the  overcomplete  lifting  scheme  with  multiscale  block 
algorithm  (a)  Original  signal  with  discontinuities  near  the  middle,  (b)  Blocked  input 
signal,  (c)  Blocked  wavelet  coefficients  after  lifting,  (d)  Unblocked  coefficients. 
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(b) 


Figure  7-4.  Examples  of  2-D  masks  of  dual  lifting  coefficients  for  N  =  4.  (a)  A  mask 
for  regions  which  are  not  affected  by  the  boundaries,  (b)  A  mask  for  the  point  (0,  0)  of 
an  input  image.  Rectangles  indicate  points  that  are  modified  by  masking  operations. 

The  proposed  algorithm  has  been  applied  to  a  mammogram  of  size  512  by  512.  We  used  a 
cubic  spline-based  filter  for  H  described  in  [20]  for  the  smoothing  lazy  wavelet  transform.  For  the 
dual  lifting,  we  applied  a  cubic  polynomial  with  coefficients  s  =  {-0.0625,  0.5625,  0.5625, 
-0.0625  }.  Figure  7-5  shows  wavelet  representations  for  three  levels  of  analysis.  We  used  the  wavelet 
coefficients  to  reconstruct  an  image  with  an  error  on  the  order  of  1 1.3dB. 

7  ?  Computational  Cost  nf  Ovp.rmmplp.tp.  1  .ifting  Scheme 

The  computation  of  the  costs  of  undecimated  DWT's  is  different  from  those  of  decimated 
ones.  In  some  of  the  decimated  DWT  algorithms,  filter  lengths  remain  unchanged  as  the  analyzing 
level  /  changes,  while  the  input  lengths  are  decimated  by  7J~ 1  where  =  1,  2, J.  However  in 
undecimated  DWT's  the  input  length  remains  unchanged,  while  the  filter  lengths  are  upsampled  by 
7J- 1 .  In  this  section,  we  compute  total  costs  of  the  various  DWT's  for  overcomplete  representations 
by  counting  all  input  points  at  each  level  of  analysis  with  different  input  lengths.  And  we  shall 
compare  them  to  see  which  algorithm  is  most  efficient. 

Let  N  and  L  be  the  octave,  the  input  length  and  the  filter  length,  respectively,  where 
L  =  2" .  For  convenience'  sake,  let  us  denote  by  ODWT(X)  an  overcomplete  DWT  of  an  X 
algorithm.  We  showed  in  Chapter  4  that  the  number  of  operations  per  input  point  in  an  elementary 


Figure  7-5.  Wavelet  coefficients  of  a  mammogram  with  the  proposed  scheme.  The  first  and  second 
column  show  the  wavelet  coefficients  in  horizontal  direction  and  vertical  direction,  respectively. 
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cell  for  the  direct  filter  bank  algorithm  is  2L  -  1  (4.2).  However  the  filter  length  L 
changes  to  2J~X{L-  1)  +  1  as  the  analyzing  level  ;'  increases.  Thus  if  we  denote  by 
On  L(direct)  the  total  complexity  of  ODWT(direct)  with  N  =  2n  input  points  at  the 
y'th  octave  with  length-L  filters,  then 

C{  L(Direct)  =  2n(2>(L  -  1)  +  1)  .  (7.3) 
In  the  FFT-based  algorithm,  the  filter  length  does  not  change  so  that  L  =  N  =  2n 
throughout  the  analysis.  The  cost  (4.17)  remains  unchanged  for  every      Thus  the 
computational  cost  of  OD  WT(FFT-based)  at  the  ;  th  octave  with  2"  inputs  is 

On  L{FFT-based)  =  2n  +  l(4n  -  3)  +  32  .  (7.4) 
The  cost  of  the  traditional  lifting  scheme  described  in  Chapter  6  was  L  +  1 .  Since 
the  filter  length  also  changes  to  2J-l(L-\)  +  l    as  /   increases,  the  cost  of 
ODWT(Lifting)  at  the  octave  /  with  2"  inputs  is  thus  given  as 

C{L(Lifting)  =  2"(2/-'a-l)  +  2)  .  (7.5) 
The  cost  of  the  lifting  scheme  (7.5)  can  be  reduced  using  the  block  algorithm 
described  in  Chapter  5.  The  basic  idea  of  the  block  algorithm  was  to  keep  the  size  of  the 
filter  kernels  unchanged  throughout  multiscale  analysis.  Instead,  we  partition  the  input 
sequence  into  two  sub-sequences  so  that  each  subsampled  sequence  contains  only  those 
pixels  affected  by  the  filter  kernel.  Thus  for  every  ,  the  filter  length  is  unchanged  and  it  is 
L.  Although  we  split  the  input  of  length  2n~i  into  two  subsampled  inputs  of  length 
2»-(/-l)  at  the  octave  / ,  the  total  number  of  points  involved  at  each  level  of  analysis  is 
unchanged  since  at  the  next  level  ;  -  1  the  number  of  blocks  is  D~ 1 ,  and  thus  the  total 
points  are  2J-1  •  2n-0-!)  =  2".  Therefore  the  computational  cost  of  ODWT(Lifting) 
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with  the  block  algorithm  (i.e.,  ODWT(Liftingb[ock) )  at  the  yth  octave  is  given  as 

C{L(Liftingblock)  =  2»(L+1).  (7.6) 
In  order  for  quantitative  comparison  between  the  complexities  given  by  (7.3), 
(7.4),  (7.5)  and  (7.6),  we  assign  practically  reasonable  values  to  the  variables,  ;',  n  and  L 
as  follows:  Since  multiscale  analysis  generally  provides  enough  information  up  to  the  fifth 
octave,  we  have 

;  =  {x\xe  {1,2,3,4,5,6,7,8,9}}  . 
A  large  percentage  of  images  we  process  for  analysis  has  lengths  of  128,  256  and  512. 
Thus 

n  =  {jc|  xe  {7,8,9}}  . 
Since  practical  applications  generally  use  short  filters, 

L  =  {jc|  *e  {2,3,5,7}}  . 

Figure  7-6  shows  the  computational  costs  of  ODWT(Direct) , 
ODWT(FFT-based) ,  ODWT(Lifting) ,  and  ODWT(Liftingblock)  with  different  filter 
lengths  L  =  2 ,  3,  5  and  7  with  n  =  7 .  Figure  7-7  and  Figure  7-8  show  the  same  costs  as 
Figure  7-6  with  different  input  lengths  n  =  8  and  9,  respectively.  Figure  7-9  shows  the 
complexities  with  analyzing  levels  up  to  7  =  3  ,  4,  5  and  6.  Here  n  =  1 .  Figure  7-10  and 
Figure  7-11  with  n  =  8  and  9,  respectively. 

Since  the  lifting  scheme  can  speed  up  the  wavelet  transform  by  a  factor  of  two  and 
convolution  using  a  block  algorithm  can  further  reduces  processing  time  by  two  fold 
within  each  analyzing  level,  the  proposed  algorithm  can  achieve  an  overall  speedup  of  4 
over  the  standard  overcomplete  wavelet  transform  where  an  FFT  is  the  basic 
implementation  tool. 


88 


O  ODWT(Direct) 
-B—  ODWT(FFT-based) 

ODWT(Lifting) 
-e-  ODWT{Liftinghlock) 


1  23456789 
analyzing  level  (j) 

(a)  L  =  2 


23456789 
analyzing  level  (j) 

(c)L  =  5 


1  23456789 
analyzing  level  (j) 

(b)  L  =  3 


cost 


900000 
800000 


700000 


1  23456789 
analyzing  level  (/') 

(d)L  =  7 


Figure  7-6.  Costs  of  OD WrfDirecfj,  ODWT(FFT-based),  ODWT(Lifting)  and  ODWT 
( Lifting block)  algorithms  with  filter  length  (a)  L  =  2,  (b)  L  =  J,  (b)  L  =  5,  and  (d)  L  =  7. 
Heren  =  7  (N  =  128). 
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Figure  7-7.  Same  as  Figure  7-6  with  n  =  8(N  =  256). 
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Figure  7-8.  Same  as  Figure  7-6  with  n  =  9  (N  =  512). 
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Figure  7-9.  Costs  of  ODWT(direct),  ODWT(FFT-based),  ODWT(lifiing)  and  ODWT 
( lifting block)  algorithms  up  to  analyzing  levels  (a)  J  =  3,  (b)  J  =  4,  (b)  7  =  5,  and  (d)  J  = 
6.  Here  n  =  7(N=  128). 
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Figure  7-10.  Same  as  Figure  7-9  with  n  =  8(N=  256). 
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Figure  7-11.  Same  as  Figure  7-9  with  n  =  8(N  =  512). 
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Based  on  the  cost  analysis  we  made  in  this  section,  we  now  conclude  that  our 
approach  for  computing  overcomplete  wavelet  representations  with  compactly  supported 
lifting  filters  ODWT(Liftingblock),  compared  to  existing  algorithms,  is  significantly  more 
efficient. 


CHAPTER  8 
CONCLUSIONS 

In  this  dissertation  we  showed  that  the  lifting  scheme  can  be  used  for  overcomplete 
multiscale  wavelet  representations  with  better  performance  without  introducing  boundary 
artifacts  that  exists  in  the  traditional  processing  methods.  Combined  with  the  multiscale 
block  algorithm  with  support  from  both  the  hardware  (SX  frame)  and  the  software  (XIL 
imaging  library),  the  new  proposed  lifting  scheme  helped  an  overcomplete  wavelet 
transform  to  achieve  an  overall  speedup  of  four  over  the  traditional  methods  where  an  FFT 
is  the  basic  implementation  tool.  We  showed  that  the  significantly  improved  performance 
of  the  proposed  method  makes  undecimated  wavelet  transforms  for  feature  analysis  very 
viable  in  interactive  processing  paradigms  such  as  a  Web-based  client-server  model  where 
the  fast  processing  is  highly  desirable. 

We  also  illustrated  that  the  basis  functions  used  in  our  approach  are  well  adapted  to 
the  boundary  of  a  finite-length  input,  so  that  the  boundary  distortions  introduced  by  signal 
extensions  do  not  appear.  This  property  is  very  useful  in  some  applications  on  general 
domains  and  in  analyzing  data  on  curves  and  surfaces.  We  also  showed  that  the  inverse 
transform  led  to  a  perfect  reconstruction  of  the  original  input. 
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